Data Science Major

Jointly offered with the Department of Computer Science & Engineering, the BA in Data Science offers students the formal foundation needed to understand the applicability and consequences of the various approaches to analyzing data with a focus on statistical modeling and machine learning.

Why choose the BA in Data Science?

Data science arises in the midst of a new era of data revolution and the challenges faced by the standard mathematical and statistical approaches when dealing with massive datasets, high dimensionality, and extremely complex data objects. These data sets appear in modern applications ranging from medicine to climatology to social sciences, to name just a few. Students trained in data science are already in high demand across a wide spectrum of industries. Data science is by nature interdisciplinary, requiring the mastery of a variety of skills and concepts, including many traditionally associated with the fields of statistics, computer science, and mathematics. In crafting the BA in Data Science, SDS and CSE have sought to leverage courses that are already taught as much as possible, while at the same time judiciously introducing a handful of new courses that capture unique aspects at the intersection of the two disciplines. The program features a novel practicum component during which students undertake a mentored experience to apply their knowledge and skills in industry or research.

The Data Science Major requires 12 core courses, 4 elective courses, one course in Ethics and Professional Responsibility, and a practicum requirement. More information including course lists and more information about the practicum can be found in the Bulletin.

sample courses:

Statistics for Data Science I & II

This is a two-course sequence covering basics of probability and statistical modeling and inference that provide the foundations for making sense of various data science methods. The first course covers concepts such as sample space, random variables and their joint distributions, statistical models, various point and interval estimation techniques and their properties. The second course covers hypotheses testing, p-values, several computer intensive methods such as the Bootstrap, Cross validation, generalized linear models and other multivariate models, with implementation using statistical software package R.

Matrix Algebra

An introductory course in linear algebra that focuses on Euclidean n-space, matrices and related computations. Topics include: systems of linear equations, row reduction, matrix operations, determinants, linear independence, dimension, rank, change of basis, diagonalization, eigenvalues, eigenvectors, orthogonality, symmetric matrices, least square approximation, quadratic forms. Introduction to abstract vector spaces.

Introduction to Machine Learning

The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. This course is a broad introduction to machine learning, covering the foundations of supervised learning and important supervised learning algorithms. Topics to be covered are the theory of generalization (including VC-dimension, the bias-variance tradeoff, validation, and regularization) and linear and non-linear learning models (including linear and logistic regression, decision trees, ensemble methods, neural networks, nearest-neighbor methods, and support vector machines).