Practical Statistics for Data Scientists
Lýsing:
Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.
Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you’ll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher-quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that "learn" from data Unsupervised learning methods for extracting meaning from unlabeled data.
Annað
- Höfundar: Peter Bruce, Andrew Bruce, Peter Gedeck
- Útgáfa:2
- Útgáfudagur: 10-04-2020
- Hægt að prenta út 2 bls.
- Hægt að afrita 2 bls.
- Format:ePub
- ISBN 13: 9781492072898
- Print ISBN: 9781492072942
- ISBN 10: 1492072893
Efnisyfirlit
- Preface
- Conventions Used in This Book
- Using Code Examples
- O’Reilly Online Learning
- How to Contact Us
- Acknowledgments
- 1. Exploratory Data Analysis
- Elements of Structured Data
- Further Reading
- Rectangular Data
- Data Frames and Indexes
- Nonrectangular Data Structures
- Further Reading
- Estimates of Location
- Mean
- Median and Robust Estimates
- Example: Location Estimates of Population and Murder Rates
- Further Reading
- Estimates of Variability
- Standard Deviation and Related Estimates
- Estimates Based on Percentiles
- Example: Variability Estimates of State Population
- Further Reading
- Exploring the Data Distribution
- Percentiles and Boxplots
- Frequency Tables and Histograms
- Density Plots and Estimates
- Further Reading
- Exploring Binary and Categorical Data
- Mode
- Expected Value
- Probability
- Further Reading
- Correlation
- Scatterplots
- Further Reading
- Exploring Two or More Variables
- Hexagonal Binning and Contours (Plotting Numeric Versus Numeric Data)
- Two Categorical Variables
- Categorical and Numeric Data
- Visualizing Multiple Variables
- Further Reading
- Summary
- Elements of Structured Data
- 2. Data and Sampling Distributions
- Random Sampling and Sample Bias
- Bias
- Random Selection
- Size Versus Quality: When Does Size Matter?
- Sample Mean Versus Population Mean
- Further Reading
- Selection Bias
- Regression to the Mean
- Further Reading
- Sampling Distribution of a Statistic
- Central Limit Theorem
- Standard Error
- Further Reading
- The Bootstrap
- Resampling Versus Bootstrapping
- Further Reading
- Confidence Intervals
- Further Reading
- Normal Distribution
- Standard Normal and QQ-Plots
- Long-Tailed Distributions
- Further Reading
- Student’s t-Distribution
- Further Reading
- Binomial Distribution
- Further Reading
- Chi-Square Distribution
- Further Reading
- F-Distribution
- Further Reading
- Poisson and Related Distributions
- Poisson Distributions
- Exponential Distribution
- Estimating the Failure Rate
- Weibull Distribution
- Further Reading
- Summary
- Random Sampling and Sample Bias
- 3. Statistical Experiments and Significance Testing
- A/B Testing
- Why Have a Control Group?
- Why Just A/B? Why Not C, D,…?
- Further Reading
- Hypothesis Tests
- The Null Hypothesis
- Alternative Hypothesis
- One-Way Versus Two-Way Hypothesis Tests
- Further Reading
- Resampling
- Permutation Test
- Example: Web Stickiness
- Exhaustive and Bootstrap Permutation Tests
- Permutation Tests: The Bottom Line for Data Science
- Further Reading
- Statistical Significance and p-Values
- p-Value
- Alpha
- Type 1 and Type 2 Errors
- Data Science and p-Values
- Further Reading
- t-Tests
- Further Reading
- Multiple Testing
- Further Reading
- Degrees of Freedom
- Further Reading
- ANOVA
- F-Statistic
- Two-Way ANOVA
- Further Reading
- Chi-Square Test
- Chi-Square Test: A Resampling Approach
- Chi-Square Test: Statistical Theory
- Fisher’s Exact Test
- Relevance for Data Science
- Further Reading
- Multi-Arm Bandit Algorithm
- Further Reading
- Power and Sample Size
- Sample Size
- Further Reading
- Summary
- A/B Testing
- 4. Regression and Prediction
- Simple Linear Regression
- The Regression Equation
- Fitted Values and Residuals
- Least Squares
- Prediction Versus Explanation (Profiling)
- Further Reading
- Multiple Linear Regression
- Example: King County Housing Data
- Assessing the Model
- Cross-Validation
- Model Selection and Stepwise Regression
- Weighted Regression
- Further Reading
- Prediction Using Regression
- The Dangers of Extrapolation
- Confidence and Prediction Intervals
- Factor Variables in Regression
- Dummy Variables Representation
- Factor Variables with Many Levels
- Ordered Factor Variables
- Interpreting the Regression Equation
- Correlated Predictors
- Multicollinearity
- Confounding Variables
- Interactions and Main Effects
- Regression Diagnostics
- Outliers
- Influential Values
- Heteroskedasticity, Non-Normality, and Correlated Errors
- Partial Residual Plots and Nonlinearity
- Polynomial and Spline Regression
- Polynomial
- Splines
- Generalized Additive Models
- Further Reading
- Summary
- Simple Linear Regression
- 5. Classification
- Naive Bayes
- Why Exact Bayesian Classification Is Impractical
- The Naive Solution
- Numeric Predictor Variables
- Further Reading
- Discriminant Analysis
- Covariance Matrix
- Fisher’s Linear Discriminant
- A Simple Example
- Further Reading
- Logistic Regression
- Logistic Response Function and Logit
- Logistic Regression and the GLM
- Generalized Linear Models
- Predicted Values from Logistic Regression
- Interpreting the Coefficients and Odds Ratios
- Linear and Logistic Regression: Similarities and Differences
- Assessing the Model
- Further Reading
- Evaluating Classification Models
- Confusion Matrix
- The Rare Class Problem
- Precision, Recall, and Specificity
- ROC Curve
- AUC
- Lift
- Further Reading
- Strategies for Imbalanced Data
- Undersampling
- Oversampling and Up/Down Weighting
- Data Generation
- Cost-Based Classification
- Exploring the Predictions
- Further Reading
- Summary
- Naive Bayes
- 6. Statistical Machine Learning
- K-Nearest Neighbors
- A Small Example: Predicting Loan Default
- Distance Metrics
- One Hot Encoder
- Standardization (Normalization, z-Scores)
- Choosing K
- KNN as a Feature Engine
- Tree Models
- A Simple Example
- The Recursive Partitioning Algorithm
- Measuring Homogeneity or Impurity
- Stopping the Tree from Growing
- Predicting a Continuous Value
- How Trees Are Used
- Further Reading
- Bagging and the Random Forest
- Bagging
- Random Forest
- Variable Importance
- Hyperparameters
- Boosting
- The Boosting Algorithm
- XGBoost
- Regularization: Avoiding Overfitting
- Hyperparameters and Cross-Validation
- Summary
- K-Nearest Neighbors
- 7. Unsupervised Learning
- Principal Components Analysis
- A Simple Example
- Computing the Principal Components
- Interpreting Principal Components
- Correspondence Analysis
- Further Reading
- K-Means Clustering
- A Simple Example
- K-Means Algorithm
- Interpreting the Clusters
- Selecting the Number of Clusters
- Hierarchical Clustering
- A Simple Example
- The Dendrogram
- The Agglomerative Algorithm
- Measures of Dissimilarity
- Model-Based Clustering
- Multivariate Normal Distribution
- Mixtures of Normals
- Selecting the Number of Clusters
- Further Reading
- Scaling and Categorical Variables
- Scaling the Variables
- Dominant Variables
- Categorical Data and Gower’s Distance
- Problems with Clustering Mixed Data
- Summary
- Principal Components Analysis
- Bibliography
- Index
UM RAFBÆKUR Á HEIMKAUP.IS
Bókahillan þín er þitt svæði og þar eru bækurnar þínar geymdar. Þú kemst í bókahilluna þína hvar og hvenær sem er í tölvu eða snjalltæki. Einfalt og þægilegt!Rafbók til eignar
Rafbók til eignar þarf að hlaða niður á þau tæki sem þú vilt nota innan eins árs frá því bókin er keypt.
Þú kemst í bækurnar hvar sem er
Þú getur nálgast allar raf(skóla)bækurnar þínar á einu augabragði, hvar og hvenær sem er í bókahillunni þinni. Engin taska, enginn kyndill og ekkert vesen (hvað þá yfirvigt).
Auðvelt að fletta og leita
Þú getur flakkað milli síðna og kafla eins og þér hentar best og farið beint í ákveðna kafla úr efnisyfirlitinu. Í leitinni finnur þú orð, kafla eða síður í einum smelli.
Glósur og yfirstrikanir
Þú getur auðkennt textabrot með mismunandi litum og skrifað glósur að vild í rafbókina. Þú getur jafnvel séð glósur og yfirstrikanir hjá bekkjarsystkinum og kennara ef þeir leyfa það. Allt á einum stað.
Hvað viltu sjá? / Þú ræður hvernig síðan lítur út
Þú lagar síðuna að þínum þörfum. Stækkaðu eða minnkaðu myndir og texta með multi-level zoom til að sjá síðuna eins og þér hentar best í þínu námi.
Fleiri góðir kostir
- Þú getur prentað síður úr bókinni (innan þeirra marka sem útgefandinn setur)
- Möguleiki á tengingu við annað stafrænt og gagnvirkt efni, svo sem myndbönd eða spurningar úr efninu
- Auðvelt að afrita og líma efni/texta fyrir t.d. heimaverkefni eða ritgerðir
- Styður tækni sem hjálpar nemendum með sjón- eða heyrnarskerðingu
- Gerð : 208
- Höfundur : 17814
- Útgáfuár : 2020
- Leyfi : 379