Apr 29, 2024  
2020-2021 Undergraduate Catalog 
    
2020-2021 Undergraduate Catalog [ARCHIVED CATALOG]

Add to Portfolio (opens a new window)

DSCI 430 - Data Science at Scale


(3 credits)

The focus of this course is on exposing students to processing data on a large scale using a distributed platform. First, students will learn the functional approach to processing large data sets. Along the way, we will encounter many of the techniques that are employed in large distributed data-processing systems, such as using common higher-order functions, employing lazy evaluation, and relying on immutable data structures. By the end of the course students will be familiar with processing large amounts of data in one or more high-level languages (e.g. Python and/or Scala) and working with a number of frameworks for distributed computation (e.g. Hadoop/MapReduce/Spark). Prerequisite: DSCI 330 - Management of Unstructured Data  or instructor permission. Grade or P/NC. Offered alternate years.


Course Registration



Add to Portfolio (opens a new window)