For the final project, you will be assigned into a team to conduct an exploratory data analysis of the U.S. Department of Education’s
For your third major homework assignment, you will use statistical inference to answer a question about the National Survey of Family Growth, Cycle 6 dataset published by the National Center for Health Statistics.
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 2: Foundation for inference
Section 2.4 – Simulation case studies
Section 2.5 – Central Limit Theorem
Chapter 4: Inference for numerical data
Introduction to computational and data sciences supplemental book
Book URL: http://book.cds101.com
Chapter 5: Statistical inference with infer
A mini-homework for practicing how to conduct hypothesis tests and calculate confidence intervals using the infer package.
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 2: Foundation for inference
Introduction
Section 2.1 – Randomization case study: gender discrimination
Section 2.2 – Randomization case study: opportunity cost
Section 2.3 – Hypothesis testing
A mini-homework for practicing how to analyze data distributions using basic statistical functions in R, ggplot2, and dplyr.
For this module exercise, you will answer a series of questions that check your understanding of the material covered in the data distribution lectures.
Introduction to computational and data sciences supplemental book
Book URL: http://book.cds101.com
Chapter 4: Representing distributions
Introduction: http://book.cds101.com/representing-distributions.html
Section 4.1 – Probability mass functions: http://book.cds101.com/probability-mass-functions.html
Section 4.2 – Cumulative distribution functions: http://book.cds101.com/cumulative-distribution-functions.html
A mini-homework for practicing how to reshape datasets using the tidyr library.
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 12: Tidy data
For your second major assignment, you will explore a dataset about the passengers on the Titanic, the British passenger liner that crashed into an iceberg during its maiden voyage and sank early in the morning on April 16, 1912.
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 5: Data transformation
Section 5.6 – Grouped summaries with summarise()
: https://r4ds.had.co.nz/transform.html#grouped-summaries-with-summarise
Section 5.7 – Grouped mutates (and filters): https://r4ds.had.co.nz/transform.html#grouped-mutates-and-filters
A mini-homework for practicing how to manipulate datasets using the dplyr library.
For this module exercise, you will follow along with the examples from the Module 4 lecture videos in an R Markdown file.
Your first major assignment is a set of exercises based around a single dataset called rail_trail, which will provide you with practice in creating visualizations using R and ggplot2
.
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 4: Workflow: basics
Chapter 5: Data transformation
Section 5.1 – Introduction: https://r4ds.had.co.nz/transform.html#introduction-2
Section 5.2 – Filter rows with filter()
: https://r4ds.had.co.nz/transform.html#filter-rows-with-filter
Section 5.3 – Arrange rows with arrange()
: https://r4ds.had.co.nz/transform.html#arrange-rows-with-arrange
Section 5.4 – Select columns with select()
: https://r4ds.had.co.nz/transform.html#select
Section 5.5 – Add new variables with mutate()
: https://r4ds.had.co.nz/transform.html#add-new-variables-with-mutate
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 3: Data visualisation
Section 3.7 – Statistical transformations: https://r4ds.had.co.nz/data-visualisation.html#statistical-transformations
Section 3.8 – Position adjustments: https://r4ds.had.co.nz/data-visualisation.html#position-adjustments
Section 3.9 – Coordinate systems: https://r4ds.had.co.nz/data-visualisation.html#coordinate-systems
Section 3.10 – The layered grammar of graphics: https://r4ds.had.co.nz/data-visualisation.html#the-layered-grammar-of-graphics
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 1: Introduction to data
A mini-homework for practicing how to make plots using the ggplot2 library.
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 3: Data visualisation
Section 3.1 – Introduction: https://r4ds.had.co.nz/data-visualisation.html#introduction-1, skip subsection 3.1.1
Section 3.2 – First steps: https://r4ds.had.co.nz/data-visualisation.html#first-steps
Section 3.3 – Aesthetic mappings: https://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings
Section 3.4 – Common problems: https://r4ds.had.co.nz/data-visualisation.html#common-problems
Section 3.5 – Facets: https://r4ds.had.co.nz/data-visualisation.html#facets
Section 3.6 – Geometric objects: https://r4ds.had.co.nz/data-visualisation.html#geometric-objects
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 1: Introduction to data
Introduction to computational and data sciences supplemental book
Book URL: http://book.cds101.com
Chapter 3: Describing numerical data
A mini-homework on editing R Markdown files and saving to GitHub.
A mini-homework to practice using RStudio to run code blocks in RMarkdown files and to create visualizations using ggplot2.
Introduction to computational and data sciences supplemental book
Book URL: http://book.cds101.com
Chapter 2: GitHub
Section 2.1 – Getting started: http://book.cds101.com/getting-started.html
Section 2.2 – Navigating the GitHub site: http://book.cds101.com/navigating-the-github-site.html
Section 2.3 – Repositories: http://book.cds101.com/repositories.html
R for Data Science
Book URL: http://r4ds.had.co.nz
Chapter 27: R Markdown
R Markdown: The Definitive Guide
Book URL: https://bookdown.org/yihui/rmarkdown
Chapter 2: Basics
Introduction: https://bookdown.org/yihui/rmarkdown/basics.html
Section 2.2 – Compile an R Markdown document: https://bookdown.org/yihui/rmarkdown/compile.html
Section 2.5 – Markdown syntax: https://bookdown.org/yihui/rmarkdown/markdown-syntax.html
Introduction to Data Science: Data Analysis and Prediction Algorithms with R
Book URL: https://rafalab.github.io/dsbook/
Chapter 77: RStudio
Section 77.1 – The panes: https://rafalab.github.io/dsbook/rstudio.html#the-panes
Section 77.2 – Key bindings: https://rafalab.github.io/dsbook/rstudio.html#key-bindings
Section 77.5 – Global options: https://rafalab.github.io/dsbook/rstudio.html#global-options
Section 77.6 – Keeping organized with RStudio projects: https://rafalab.github.io/dsbook/rstudio.html#keeping-organized-with-rstudio-projects
Section 77.7 – Using Git and GitHub in RStudio: https://rafalab.github.io/dsbook/rstudio.html#using-git-and-github-in-rstudio
A mini-homework about a data science study that used Twitter data to predict election outcomes.
For this module exercise, you will answer a series of questions that check your understanding of the material covered in the Module 1 lecture videos.
Introductory Statistics with Randomization and Simulation
Click here to download the textbook.
Chapter 1: Introduction to data