Introduction
1. Data Science
1.1 Introduction
1.1.1 Computational Tools
1.1.2 Statistical Techniques
1.2 Why Data Science?
1.3 Plotting the Classics
1.3.1 Literary Characters
1.3.2 Another Kind of Character
2. Causality and Experiments
2.1 John Snow and the Broad Street Pump
2.2 Snow’s “Grand Experiment”
2.3 Establishing Causality
2.4 Randomization
2.5 Endnote
3. Programming in Python
3.1 Expressions
3.2 Names
3.2.1 Example: Growth Rates
3.3 Call Expressions
3.4 Introduction to Tables
4. Data Types
4.1 Numbers
4.2 Strings
4.2.1 String Methods
4.3 Comparisons
5. Sequences
5.1 Arrays
5.2 Ranges
5.3 More on Arrays
6. Tables
6.1 Sorting Rows
6.2 Selecting Rows
6.3 Example: Population Trends
6.4 Example: Trends in Gender
7. Visualization
7.1 Categorical Distributions
7.2 Numerical Distributions
7.3 Overlaid Graphs
8. Functions and Tables
8.1 Applying Functions to Columns
8.2 Classifying by One Variable
8.3 Cross-Classifying
8.4 Joining Tables by Columns
8.5 Bike Sharing in the Bay Area
9. Randomness
9.1 Conditional Statements
9.2 Iteration
9.3 Simulation
9.4 The Monty Hall Problem
9.5 Finding Probabilities
10. Sampling and Empirical Distributions
10.1 Empirical Distributions
10.2 Sampling from a Population
10.3 Empirical Distibution of a Statistic
11. Testing Hypotheses
11.1 Assessing Models
11.2 Multiple Categories
11.3 Decisions and Uncertainty
11.4 Error Probabilities
12. Comparing Two Samples
12.1 A/B Testing
12.2 Deflategate
12.3 Causality
13. Estimation
13.1 Percentiles
13.2 The Bootstrap
13.3 Confidence Intervals
13.4 Using Confidence Intervals
14. Why the Mean Matters
14.1 Properties of the Mean
14.2 Variability
14.3 The SD and the Normal Curve
14.4 The Central Limit Theorem
14.5 The Variability of the Sample Mean
14.6 Choosing a Sample Size
15. Prediction
15.1 Correlation
15.2 The Regression Line
15.3 The Method of Least Squares
15.4 Least Squares Regression
15.5 Visual Diagnostics
15.6 Numerical Diagnostics
16. Inference for Regression
16.1 A Regression Model
16.2 Inference for the True Slope
16.3 Prediction Intervals
17. Classification
17.1 Nearest Neighbors
17.2 Training and Testing
17.3 Rows of Tables
17.4 Implementing the Classifier
17.5 The Accuracy of the Classifier
17.6 Multiple Regression
18. Updating Predictions
18.1 A "More Likely Than Not" Binary Classifier
18.2 Making Decisions
Powered by
Jupyter Book
.pdf