Bayesian Data Analysis

Námskeið
- STÆ529M Hagnýt Bayesísk tölfræði.
Ensk lýsing:
Winner of the 2016 De Groot Prize from the International Society for Bayesian Analysis Now in its third edition, this classic book is widely considered the leading text on Bayesian methods, lauded for its accessible, practical approach to analyzing data and solving research problems. Bayesian Data Analysis, Third Edition continues to take an applied approach to analysis using up-to-date Bayesian methods.
The authors—all leaders in the statistics community—introduce basic concepts from a data-analytic perspective before presenting advanced methods. Throughout the text, numerous worked examples drawn from real applications and research emphasize the use of Bayesian inference in practice. New to the Third Edition Four new chapters on nonparametric modeling Coverage of weakly informative priors and boundary-avoiding priors Updated discussion of cross-validation and predictive information criteria Improved convergence monitoring and effective sample size calculations for iterative simulation Presentations of Hamiltonian Monte Carlo, variational Bayes, and expectation propagation New and revised software code The book can be used in three different ways.
For undergraduate students, it introduces Bayesian inference starting from first principles. For graduate students, the text presents effective current approaches to Bayesian modeling and computation in statistics and related fields. For researchers, it provides an assortment of Bayesian methods in applied statistics. Additional materials, including data sets used in the examples, solutions to selected exercises, and software instructions, are available on the book’s web page.
Annað
- Höfundar: Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin
- Útgáfa:3
- Útgáfudagur: 2013-11-27
- Hægt að prenta út 2 bls.
- Hægt að afrita 2 bls.
- Format:ePub
- ISBN 13: 9781439840962
- Print ISBN: 9781439840955
- ISBN 10: 1439840962
Efnisyfirlit
- Front Matter
- Preface
- Changes for the third edition
- Online information
- Acknowledgments
- Preface
- Fundamentals of Bayesian Inference
- Chapter 1 Probability and inference
- 1.1 The three steps of Bayesian data analysis
- 1.2 General notation for statistical inference
- Parameters, data, and predictions
- Observational units and variables
- Exchangeability
- Explanatory variables
- Hierarchical modeling
- 1.3 Bayesian inference
- Probability notation
- Bayes’ rule
- Prediction
- Likelihood
- Likelihood and odds ratios
- 1.4 Discrete probability examples: genetics and spell checking
- Inference about a genetic status
- Spelling correction
- 1.5 Probability as a measure of uncertainty
- Subjectivity and objectivity
- 1.6 Example of probability assignment: football point spreads
- Figure 1.1 Scatterplot of actual outcome vs. point spread for each of 672 professional football games. The × and y coordinates are jittered by adding uniform random numbers to each point's coordinates (between −0.1 and 0.1 for the × coordinate; between −0.2 and 0.2 for the y coordinate) in order to display multiple values but preserve the discrete-valued nature of each.
- Football point spreads and game outcomes
- Assigning probabilities based on observed frequencies
- Figure 1.2 (a) Scatterplot of (actual outcome — point spread) vs. point spread for each of 672 professional football games (with uniform random jitter added to × and y coordinates). (b) Histogram of the differences between the game outcome and the point spread, with the N(0, 142) density superimposed.
- A parametric model for the difference between outcome and point spread
- Assigning probabilities using the parametric model
- 1.7 Example: estimating the accuracy of record linkage
- Existing methods for assigning scores to potential matches
- Figure 1.3 Histograms of weight scores y for true and false matches in a sample of records from the 1988 test Census. Most of the matches in the sample are true (because a pre-screening process has already picked these as the best potential match for each case), and the two distributions are mostly, but not completely, separated.
- Estimating match probabilities empirically
- Figure 1.4 Lines show expected false-match rate (and 95% bounds) as a function of the proportion of cases declared matches, based on the mixture model for record linkage. Dots show the actual false-match rate for the data.
- External validation of the probabilities using test data
- Figure 1.5 Expansion of Figure 1.4 in the region where the estimated and actual match rates change rapidly. In this case, it would seem a good idea to match about 88% of the cases and send the rest to followup.
- Existing methods for assigning scores to potential matches
- 1.8 Some useful results from probability theory
- Modeling using conditional probability
- Means and variances of conditional distributions
- Transformation of variables
- 1.9 Computation and software
- Summarizing inferences by simulation
- Sampling using the inverse cumulative distribution function
- Simulation of posterior and posterior predictive quantities
- Table 1.1 Structure of posterior and posterior predictive simulations. The superscripts are indexes, not powers.
- 2.1 Estimating a probability from binomial data
- Example. Estimating the probability of a female birth
- Figure 2.1 Unnormalized posterior density for binomial parameter θ, based on uniform prior distribution and y successes out of n trials. Curves displayed for several values of n and y.
- Historical note: Bayes and Laplace
- Prediction
- Example. Estimating the probability of a female birth
- 2.2 Posterior as compromise between data and prior information
- 2.3 Summarizing posterior inference
- Figure 2.2 Hypothetical density for which the 95% central interval and 95% highest posterior density region dramatically differ: (a) central posterior interval, (b) highest posterior density region.
- Posterior quantiles and intervals
- 2.4 Informative prior distributions
- Binomial example with different prior distributions
- Conjugate prior distributions
- Nonconjugate prior distributions
- Conjugate prior distributions, exponential families, and sufficient statistics
- Example. Probability of a girl birth given placenta previa
- Figure 2.3 Draws from the posterior distribution of (a) the probability of female birth, θ; (b) the logit transform, logit(θ); (c) the male-to-female sex ratio, φ = (1 − θ)/θ).
- Table 2.1 Summaries of the posterior distribution of θ, the probability of a girl birth given placenta previa, under a variety of conjugate prior distributions.
- Figure 2.4 (a) Prior density for θ in an example nonconjugate analysis of birth ratio example; (b) histogram of 1000 draws from a discrete approximation to the posterior density. Figures are plotted on different scales.
- Example. Probability of a girl birth given placenta previa
- Likelihood of one data point
- Conjugate prior and posterior distributions
- Posterior predictive distribution
- Normal model with multiple observations
- Normal distribution with known mean but unknown variance
- Poisson model
- Poisson model parameterized in terms of rate and exposure
- Estimating a rate from Poisson data: an idealized example
- Figure 2.5 Posterior density for θ, the asthma mortality rate in cases per 100,000 persons per year, with a Gamma(3.0, 5.0) prior distribution: (a) given y = 3 deaths out of 200,000 persons; (b) given y = 30 deaths in 10 years for a constant population of 200,000. The histograms appear jagged because they are constructed from only 1000 random draws from the posterior distribution in each case.
- Estimating a rate from Poisson data: an idealized example
- Figure 2.6 The counties of the United States with the highest 10% age-standardized death rates for cancer of kidney/ureter for U.S. white males, 1980–1989. Why are most of the shaded counties in the middle of the country? See Section 2.7 for discussion.
- A puzzling pattern in a map
- Figure 2.7 The counties of the United States with the lowest 10% age-standardized death rates for cancer of kidney/ureter for U.S. white males, 1980–1989. Surprisingly, the pattern is somewhat similar to the map of the highest rates, shown in Figure 2.6.
- Bayesian inference for the cancer death rates
- Relative importance of the local data and the prior distribution
- Figure 2.8 (a) Kidney cancer death rates yj/(10nj) vs. population size nj. (b) Replotted on the scale of log10 population to see the data more clearly. The patterns come from the discreteness of the data (nj = 0, 1, 2,...).
- Figure 2.9 (a) Bayes-estimated posterior mean kidney cancer death rates, vs. logarithm of population size nj, the 3071 counties in the U.S. (b) Posterior medians and 50% intervals for θj for a sample of 100 counties j. The scales on the y-axes differ from the plots in Figure 2.8b.
- Constructing a prior distribution
- Figure 2.10 Empirical distribution of the age-adjusted kidney cancer death rates, for the 3071 counties in the U.S., along with the Gamma(20, 430,000) prior distribution for the underlying cancer rates θj.
- Proper and improper prior distributions
- Improper prior distributions can lead to proper posterior distributions
- Jeffreys’ invariance principle
- Various noninformative prior distributions for the binomial parameter
- Pivotal quantities
- Difficulties with noninformative prior distributions
- Constructing a weakly informative prior distribution
- Table 2.2 Worldwide airline fatalities, 1976–1985. Death rate is passenger deaths per 100 million passenger miles. Source: Statistical Abstract of the United States.
- 3.1 Averaging over ‘nuisance parameters'
- 3.2 Normal data with a noninformative prior distribution
- A noninformative prior distribution
- The conditional posterior distribution, p(μ|σ2, y)
- The marginal posterior distribution, p(σ2|y)
- Sampling from the joint posterior distribution
- Analytic form of the marginal posterior distribution of μ
- Posterior predictive distribution for a future observation
- Example. Estimating the speed of light
- Figure 3.1 Histogram of Simon Newcomb's measurements for estimating the speed of light, from Stigler (1977). The data are recorded as deviations from 24,800 nanoseconds.
- Example. Estimating the speed of light
- A family of conjugate prior distributions
- The joint posterior distribution, p(μ, σ2|y)
- The conditional posterior distribution, p(μ|σ2, y)
- The marginal posterior distribution, p(σ2|y)
- Sampling from the joint posterior distribution
- Analytic form of the marginal posterior distribution of μ
- Example. Pre-election polling
- Figure 3.2 Histogram of values of (θ1 − θ2) for 1000 simulations from the posterior distribution for the election polling example.
- Multivariate normal likelihood
- Conjugate analysis
- Conjugate inverse-Wishart family of prior distributions
- Different noninformative prior distributions
- Table 3.1: Bioassay data from Racine et al. (1986).
- Scaled inverse-Wishart model
- The scientific problem and the data
- Modeling the dose—response relation
- The likelihood
- The prior distribution
- Figure 3.3 (a) Contour plot for the posterior density of the parameters in the bioassay example. Contour lines are at 0.05, 0.15,..., 0.95 times the density at the mode. (b) Scatterplot of 1000 draws from the posterior distribution.
- A rough estimate of the parameters
- Obtaining a contour plot of the joint posterior density
- Sampling from the joint posterior distribution
- Figure 3.4 Histogram of the draws from the posterior distribution of the LD50 (on the scale of log dose in g/ml) in the bioassay example, conditional on the parameter β being positive.
- The posterior distribution of the LD50
- Table 3.2 Number of respondents in each preference category from ABC News pre- and post-debate surveys in 1988.
- Table 3.3 Counts of bicycles and other vehicles in one hour in each of 10 city blocks in each of six categories. (The data for two of the residential blocks were lost.) For example, the first block had 16 bicycles and 58 other vehicles, the second had 9 bicycles and 90 other vehicles, and so on. Streets were classified as ‘residential,’ ‘fairly busy,’ or ‘busy’ before the data were gathered.
- 4.1 Normal approximations to the posterior distribution
- Normal approximation to the joint posterior distribution
- Example. Normal distribution with unknown mean and variance
- Interpretation of the posterior density function relative to its maximum
- Summarizing posterior distributions by point estimates and standard errors
- Data reduction and summary statistics
- Lower-dimensional normal approximations
- Figure 4.1 (a) Contour plot of the normal approximation to the posterior distribution of the parameters in the bioassay example. Contour lines are at 0.05, 0.15,..., 0.95 times the density at the mode. Compare to Figure 3.3a. (b) Scatterplot of 1000 draws from the normal approximation to the posterior distribution. Compare to Figure 3.3b.
- Example. Bioassay experiment (continued)
- Figure 4.2 (a) Histogram of the simulations of LD50, conditional on β > 0, in the bioassay example based on the normal approximation p(α, β|y). The wide tails of the histogram correspond to values of β close to 0. Omitted from this histogram are five simulation draws with values of LD50 less than −2 and four draws with values greater than 2; the extreme tails are truncated to make the histogram visible. The values of LD50 for the 950 simulation draws corresponding to β > 0 had a range of [-12.4, 5.4]. Compare to Figure 3.4. (b) Histogram of the central 95% of the distribution.
- Normal approximation to the joint posterior distribution
- Notation and mathematical setup
- Asymptotic normality and consistency
- Likelihood dominating the prior distribution
- Large-sample correspondence
- Point estimation, consistency, and efficiency
- Confidence coverage
- Maximum likelihood and other point estimates
- Unbiased estimates
- Example. Prediction using regression
- Confidence intervals
- Hypothesis testing
- Multiple comparisons and multilevel modeling
- Nonparametric methods, permutation tests, jackknife, bootstrap
- Example. The Wilcoxon rank test
- Table 5.1 Tumor incidence in historical control groups and current group of rats, from Tarone (1982). The table displays the values of (number of rats with tumors)/(total number of rats).
- 5.1 Constructing a parameterized prior distribution
- Analyzing a single experiment in the context of historical data
- Example. Estimating the risk of tumor in a group of rats
- Figure 5.1: Structure of the hierarchical model for the rat tumor example.
- Example. Estimating the risk of tumor in a group of rats
- Analyzing a single experiment in the context of historical data
- Logic of combining information
- Exchangeability
- Example. Exchangeability and sampling
- Exchangeability when additional information is available on the units
- Objections to exchangeable models
- The full Bayesian treatment of the hierarchical model
- The hyperprior distribution
- Posterior predictive distributions
- Analytic derivation of conditional and marginal distributions
- Drawing simulations from the posterior distribution
- Application to the model for rat tumors
- Figure 5.2 First try at a contour plot of the marginal posterior density of log(α+β)) for the rat tumor example. Contour lines are at 0.05, 0.15,..., 0.95 times the density at the mode.
- Figure 5.3 (a) Contour plot of the marginal posterior density of for the rat tumor example. Contour lines are at 0.05, 0.15,..., 0.95 times the density at the mode. (b) Scatterplot of 1000 draws from the numerically computed marginal posterior density.
- Figure 5.4 Posterior medians and 95% intervals of rat tumor rates, θj (plotted vs. observed tumor rates yj/nj), based on simulations from the joint posterior distribution. The 45° line corresponds to the unpooled estimates, The horizontal positions of the line have been jittered to reduce overlap.
- The data structure
- Constructing a prior distribution from pragmatic considerations
- The hierarchical model
- The joint posterior distribution
- The conditional posterior distribution of the normal means, given the hyperparameters
- The marginal posterior distribution of the hyperparameters
- Computation
- Posterior predictive distributions
- Difficulty with a natural non-Bayesian estimate of the hyperparameters
- Inferences based on nonhierarchical models and their problems
- Table 5.2 Observed effects of special preparation on SAT-V scores in eight randomized experiments. Estimates are based on separate analyses for the eight experiments.
- Figure 5.5 Marginal posterior density, p(τ|y), for standard deviation of the population of school effects θj in the educational testing example.
- Posterior simulation under the hierarchical model
- Results
- Figure 5.6 Conditional posterior means of treatment effects, E(θj|τ,y), as functions of the between-school standard deviation τ, for the educational testing example. The line for school C crosses the lines for E and F because C has a higher measurement error (see Table 5.2) and its estimate is therefore shrunk more strongly toward the overall mean in the Bayesian analysis.
- Figure 5.7 Conditional posterior standard deviations of treatment effects, sd(θj|τ, y), as functions of the between-school standard deviation τ, for the educational testing example.
- Discussion
- Table 5.3: Summary of 200 simulations of the treatment effects in the eight schools.
- Figure 5.8 Histograms of two quantities of interest computed from the 200 simulation draws: (a) the effect in school A, θ1; (b) the largest effect, max{θj}. The jaggedness of the histograms is just an artifact caused by sampling variability from using only 200 random draws.
- Table 5.4 Results of 22 clinical trials of beta-blockers for reducing mortality after myocardial infarction, with empirical log-odds and approximate sampling variances. Data from Yusuf et al. (1985). Posterior quantiles of treatment effects are based on 5000 draws from a Bayesian hierarchical model described here. Negative effects correspond to reduced probability of death under the treatment.
- Defining a parameter for each study
- A normal approximation to the likelihood
- Goals of inference in meta-analysis
- What if exchangeability is inappropriate?
- A hierarchical normal model
- Table 5.5 Summary of posterior inference for the overall mean and standard deviation of study effects, and for the predicted effect in a hypothetical future study, from the meta-analysis of the beta-blocker trials in Table 5.4. All effects are on the log-odds scale.
- Results of the analysis and comparison to simpler methods
- Concepts relating to the choice of prior distribution
- Classes of noninformative and weakly informative prior distributions for hierarchical variance parameters
- Application to the 8-schools example
- Figure 5.9 Histograms of posterior simulations of the between-school standard deviation, τ, from models with three different prior distributions: (a) uniform prior distribution on τ, (b) inverse-gamma(1, 1) prior distribution on τ2, (c) inverse-gamma(0.001, 0.001) prior distribution on τ2. Overlain on each is the corresponding prior density function for τ. (For models (b) and (c), the density for τ is calculated using the gamma density function multiplied by the Jacobian of the 1/τ2 transformation.) In models (b) and (c), posterior inferences are strongly constrained by the prior distribution.
- Weakly informative prior distribution for the 3-schools problem
- Figure 5.10 Histograms of posterior simulations of the between-school standard deviation, τ, from models for the 3-schools data with two different prior distributions on τ: (a) uniform (0, ∞), (b) half-Cauchy with scale 25, set as a weakly informative prior distribution given that τ was expected to be well below 100. The histograms are not on the same scales. Overlain on each histogram is the corresponding prior density function. With only J = 3 groups, the noninformative uniform prior distribution is too weak, and the proper Cauchy distribution works better, without appearing to distort inferences in the area of high likelihood.
- Fundamentals of Bayesian Data Analysis
- Chapter 6 Model checking
- 6.1 The place of model checking in applied Bayesian statistics
- Sensitivity analysis and model improvement
- Judging model flaws by their practical implications
- 6.2 Do the inferences from the model make sense?
- Example. Evaluating election predictions by comparing to substantive political knowledge
- External validation
- Figure 6.1 Summary of a forecast of the 1992 U.S. presidential. election performed one month before the election. For each state, the proportion of the box that is shaded represents the estimated probability of Clinton winning the state; the width of the box is proportional to the number of electoral votes for the state.
- Choices in defining the predictive quantities
- 6.3 Posterior predictive checking
- Example. Comparing Newcomb's speed of light measurements to the posterior predictive distribution
- Figure 6.2 Twenty replications, yrep, of the speed of light data from the posterior predictive distribution, p(yrep|y); compare to observed data, y, in Figure 3.1. Each histogram displays the result of drawing 66 independent values from a common normal distribution with mean and variance (μ, σ2) drawn from the posterior distribution, p(μ, σ2|y), under the normal model.
- Figure 6.3 Smallest observation of Newcomb's speed of light data (the vertical line at the left of the graph), compared to the smallest observations from each of the 20 posterior predictive simulated datasets displayed in Figure 6.2.
- Notation for replications
- Test quantities
- Tail-area probabilities
- Figure 6.4 Realized vs. posterior predictive distributions for two more test quantities in the speed of light example: (a) Sample variance (vertical line at 115.5), compared to 200 simulations from the posterior predictive distribution of the sample variance. (b) Scatterplot showing prior and posterior simulations of a test quantity: T(y, θ) = |y(61) − θ| − |y(6) − θ| (horizontal axis) vs. (vertical axis) based on 200 simulations from the posterior distribution of (θ, yrep). The p-value is computed as the proportion of points in the upper-left half of the scatterplot.
- Choosing test quantities
- Example. Checking the assumption of independence in binomial trials
- Figure 6.5 Observed number of switches. (vertical line at T(y) = 3), compared to 10,000 simulations from the posterior predictive distribution of the number of switches, T(yrep).
- Example. Checking the fit of hierarchical regression models for adolescent smoking
- Figure 6.6 Prevalence of regular (daily) smoking among participants responding at each wave in the study of Australian adolescents (who were on average 15 years old at wave 1).
- Table 6.1 Summary of posterior predictive checks for three test statistics for two models fit to the adolescent smoking data: (1) hierarchical logistic regression, and (2) hierarchical logistic regression with a mixture component for never-smokers. The second model better fits the percentages of never-and always-smokers, but still has a problem with the percentage of ‘incident smokers,’ who are defined as persons who report incidents of nonsmoking followed by incidents of smoking.
- Example. Comparing Newcomb's speed of light measurements to the posterior predictive distribution
- Multiple comparisons
- Interpreting posterior predictive p-values
- Limitations of posterior tests
- P-values and u-values
- Model checking and the likelihood principle
- Marginal predictive checks
- 6.1 The place of model checking in applied Bayesian statistics
- 6.4 Graphical posterior predictive checks
- Figure 6.7 Left column displays observed data y (a 15 × 23 array of binary responses from each of 6 persons); right columns display seven replicated datasets yrep from a fitted logistic regression model. A misfit of model to data is apparent: the data show strong row and column patterns for individual persons (for example, the nearly white row near the middle of the last person's data) that do not appear in the replicates. (To make such patterns clearer, the indexes of the observed and each replicated dataset have been arranged in increasing order of average response.)
- Direct data display
- Figure 6.8 Redisplay of Figure 6.7 without ordering the rows, columns, and persons in order of increasing response. Once again, the left column shows the observed data and the right columns show replicated datasets from the model. Without the ordering, it is difficult to notice the discrepancies between data and model, which are easily apparent in Figure 6.7
- Displaying summary statistics or inferences
- Figure 6.9 Histograms of (a) 90 patient parameters and (b) 69 symptom parameters, from a single draw from the posterior distribution of a psychometric model. These histograms of posterior estimates contradict the assumed Beta(2, 2) prior densities (overlain on the histograms) for each batch of parameters, and motivated us to switch to mixture prior distributions. This implicit comparison to the values under the prior distribution can be viewed as a posterior predictive check in which a new set of patients and a new set of symptoms are simulated.
- Figure 6.10 Histograms of (a) 90 patient parameters and (b) 69 symptom parameters, as estimated from an expanded psychometric model. The mixture prior densities (overlain on the histograms) are not perfect, but they approximate the corresponding histograms much better than the Beta(2, 2) densities in Figure 6.9.
- Residual plots and binned residual plots
- Figure 6.11 (a) Residuals (observed — expected) vs. expected values for a model of pain relief scores (0 = no pain relief..., 5 = complete pain relief). (b) Average residuals vs. expected pain scores, with measurements divided into 20 equally sized bins defined by ranges of expected pain scores. The average prediction errors are relatively small (note the scale of the y-axis), but with a consistent pattern that low predictions are too low and high predictions are too high. Dotted lines show 95% bounds under the model.
- General interpretation of graphs as model checks
- 6.5 Model checking for the educational testing example
- Assumptions of the model
- Comparing posterior inferences to substantive knowledge
- Posterior predictive checking
- Sensitivity analysis
- Figure 6.12 Posterior predictive distribution, observed result, and p-value for each of four test statistics for the educational testing example.
- Example. Forecasting presidential elections
- Figure 7.1 Douglas Hibbs's ‘bread and peace’ model of voting and the economy. Presidential elections since 1952 are listed in order of the economic performance at the end of the preceding administration (as measured by inflation-adjusted growth in average personal income). The better the economy, the better the incumbent party's candidate generally does, with the biggest exceptions being 1952 (Korean War) and 1968 (Vietnam War).
- 7.1 Measures of predictive accuracy
- Predictive accuracy for a single data point
- Averaging over the distribution of future data
- Evaluating predictive accuracy for a fitted model
- Choices in defining the likelihood and predictive quantities
- 7.2 Information criteria and cross-validation
- Estimating out-of-sample predictive accuracy using available data
- Log predictive density asymptotically, or for normal linear models
- Figure 7.2 Posterior distribution of the log predictive density log p(y|θ) for the election forecasting example. The variation comes from posterior uncertainty in θ. The maximum value of the distribution, −40.3, is the log predictive density when θ is at the maximum likelihood estimate. The mean of the distribution is −42.0, and the difference between the mean and the maximum is 1.7, which is close to the value of 3/2 that would be predicted from asymptotic theory, given that we are estimating 3 parameters (two coefficients and a residual variance).
- Example. Fit of the election forecasting model: Bayesian inference
- Akaike information criterion (AIC)
- Deviance information criterion (DIC) and effective number of parameters
- Watanabe-Akaike or widely available information criterion (WAIC)
- Effective number of parameters as a random variable
- ‘Bayesian’ information criterion (BIC)
- Leave-one-out cross-validation
- Comparing different estimates of out-of sample prediction accuracy
- Example. Predictive error in the election forecasting model
- Example. Expected predictive accuracy of models for the eight schools
- Table 7.1 Deviance (−2 times log predictive density) and corrections for parameter fitting using AIC, DIC, WAIC (using the correction pWAIC 2), and leave-one-out cross-validation for each of three models fitted to the data in Table 5.2. Lower values of AIC/DIC/WAIC imply higher predictive accuracy.Blank cells in the table correspond to measures that are undefined: AIC is defined relative to the maximum likelihood estimate and so is inappropriate for the hierarchical model; cross-validation requires prediction for the held-out case, which is impossible under the no-pooling model.The no-pooling model has the best raw fit to data, but after correcting for fitted parameters, the complete-pooling model has lowest estimated expected predictive error under the different measures. In general, we would expect the hierarchical model to win, but in this particular case, setting τ = 0 (that is, the complete-pooling model) happens to give the best average predictive performance.
- Evaluating predictive error comparisons
- Bias induced by model selection
- Challenges
- Example. A discrete example in which Bayes factors are helpful
- Example. A continuous example where Bayes factors are a distraction
- Sensitivity analysis
- Adding parameters to a model
- Accounting for model choice in data analysis
- Selection of predictors and combining information
- Alternative model formulations
- Practical advice for model checking and expansion
- Table 7.2 Summary statistics for populations of municipalities in New York State in 1960 (New York City was represented by its five boroughs); all 804 municipalities and two independent simple random samples of 100. From Rubin (1983a).
- Example. Estimating a population total under simple random sampling using transformed normal models
- Table 7.3 Short-term measurements of radon concentration (in picoCuries/liter) in a sample of houses in three counties in Minnesota. All measurements were recorded on the basement level of the houses, except for those indicated with asterisks, which were recorded on the first floor.
- 8.1 Bayesian inference requires a model for data collection
- Generality of the observed- and missing-data paradigm
- Table 8.1: Use of observed- and missing-data terminology for various data structures.
- Generality of the observed- and missing-data paradigm
- Notation for observed and missing data
- Stability assumption
- Fully observed covariates
- Data model, inclusion model, and complete and observed data likelihood
- Joint posterior distribution of parameters θ from the sampling model and φ from the missing-data model
- Finite-population and superpopulation inference
- Ignorability
- ‘Missing at random’ and ‘distinct parameters'
- Ignorability and Bayesian inference under different data-collection schemes
- Propensity scores
- Unintentional missing data
- Simple random sampling of a finite population
- Stratified sampling
- Table 8.2 Results of a CBS News survey of 1447 adults in the United States, divided into 16 strata. The sampling is assumed to be proportional, so that the population proportions, Nj/N, are approximately equal to the sampling proportions, nj/n.
- Example. Stratified sampling in pre-election polling
- Figure 8.1 Values of for 1000 simulations from the posterior distribution for the election polling example, based on (a) the simple nonhierarchical model and (b) the hierarchical model. Compare to Figure 3.2.
- Table 8.3 Summary of posterior inference for the hierarchical analysis of the CBS survey in Table 8.2. The posterior distributions for the α1j's vary from stratum to stratum much less than the raw counts do. The inference for α2,16 for stratum 16 is included above as a representative of the 16 parameters α2j. The parameters μ1 and μ2 are transformed to the inverse-logit scale so they can be more directly interpreted.
- Example. A survey of Australian schoolchildren
- Example. Sampling of Alcoholics Anonymous groups
- Completely randomized experiments
- Table 8.4 Yields of plots of millet arranged in a Latin square. Treatments A, B, C, D, E correspond to spacings of width 2, 4, 6, 8, 10 inches, respectively. Yields are in grams per inch of spacing. From Snedecor and Cochran (1989).
- Randomized blocks, Latin squares, etc.
- Example. Latin square experiment
- Sequential designs
- Including additional predictors beyond the minimally adequate summary
- Example. An experiment with treatment assignments based on observed covariates
- Complete randomization
- Randomization given covariates
- Designs that ‘cheat'
- Bayesian analysis of nonrandomized studies
- Comparison to experiments
- Figure 8.2 Hypothetical-data illustrations of sensitivity analysis for observational studies. In each graph, circles and dots represent treated and control units, respectively. (a) The first plot shows balanced data, as from a randomized experiment, and the difference between the two lines shows the estimated treatment effect from a simple linear regression. (b, c) The second and third plots show unbalanced data, as from a poorly conducted observational study, with two different models fit to the data. The estimated treatment effect for the unbalanced data in (b) and (c) is highly sensitive to the form of the fitted model, even when the treatment assignment is ignorable.
- Bayesian inference for observational studies
- Table 8.5 Summary statistics from an experiment on vitamin A supplements, where the vitamin was available (but optional) only to those assigned the treatment. The table shows number of units in each assignment/exposure/outcome condition. From Sommer and Zeger (1991).
- Causal inference and principal stratification
- Example. A randomized experiment with noncompliance
- Complier average causal effects and instrumental variables
- Bayesian causal inference with noncompliance
- More complicated patterns of missing data
- Table 8.6 Yields of penicillin produced by four manufacturing processes (treatments), each applied in five different conditions (blocks). Four runs were made within each block, with the treatments assigned to the runs at random. From Box, Hunter, and Hunter (1978), who adjusted the data so that the averages are integers, a complication we ignore in our analysis.
- Table 8.7 Respondents to the CBS telephone survey classified by opinion and number of residential telephone lines (category ‘?’ indicates no response to the number of phone lines question).
- Table 8.8 Respondents to the CBS telephone survey classified by opinion, number of residential telephone lines (category ‘?’ indicates no response to the number of phone lines question), and number of adults in the household (category ‘?’ includes all responses greater than 8 as well as nonresponses).
- 9.1 Bayesian decision theory in different contexts
- Bayesian inference and decision trees
- Summarizing inference and model selection
- 9.2 Using regression predictions: incentives for telephone surveys
- Background on survey incentives
- Figure 9.1 Observed increase zi in response rate vs. the increased dollar value of incentive compared to the control condition, for experimental data from 39 surveys. Prepaid and postpaid incentives are indicated by closed and open circles, respectively. (The graphs show more than 39 points because many surveys had multiple treatment conditions.) The lines show expected increases for prepaid (solid lines) and postpaid (dashed lines) cash incentives as estimated from a hierarchical regression.
- Data from 39 experiments
- Setting up a Bayesian meta-analysis
- Inferences from the model
- Figure 9.2 Residuals of response rate meta-analysis data plotted vs. predicted values. Residuals for telephone and face-to-face surveys are shown separately. As in Figure 9.1, solid and open circles indicate surveys with prepaid and postpaid incentives, respectively.
- Inferences about costs and response rates for the Social Indicators Survey
- Figure 9.3 Expected increase in response rate vs. net added cost per respondent, for prepaid (solid lines) and postpaid (dotted lines) incentives, for surveys of individuals and caregivers. On each plot, heavy lines correspond to the estimated effects, with light lines showing ±1 standard error bounds. The numbers on the lines indicate incentive payments. At zero incentive payments, estimated effects and costs are nonzero because the models have nonzero intercepts (corresponding to the effect of making any contact at all) and we are assuming a $1.25 mailing and processing cost per incentive.
- Loose ends
- Background on survey incentives
- 9.3 Multistage decision making: medical screening
- Example with a single decision point
- Adding a second decision point
- 9.4 Hierarchical decision analysis for radon measurement
- Figure 9.4 Lifetime added risk of lung cancer, as a function of average radon exposure in picoCuries per liter (pCi/L). The median and mean radon levels in ground-contact houses in the U.S. are 0.67 and 1.3 pCi/L, respectively, and over 50,000 homes have levels above 20 pCi/L.
- Background
- The individual decision problem
- Decision-making under certainty
- Bayesian inference for county radon levels
- Bayesian inference for the radon level in an individual house
- Decision analysis for individual homeowners
- Figure 9.5 Recommended radon remediation/measurement decision as a function of the perfect-information action level Raction and the prior geometric mean radon level eM, under the simplifying assumption that eS = 2.3. You can read off your recommended decision from this graph and, if the recommendation is ‘take a measurement,’ you can do so and then perform the calculations to determine whether to remediate, given your measurement. The horizontal axis of this figure begins at 2 pCi/L because remediation is assumed to reduce home radon level to 2 pCi/L, so it makes no sense for Raction to be lower than that value. Wiggles in the lines are due to simulation variability.
- Figure 9.6 Maps showing (a) fraction of houses in each county for which measurement is recommended, given the perfect-information action level of Raction = 4 pCi/L; (b) expected fraction of houses in each county for which remediation will be recommended, once the measurement y has been taken. For the present radon model, within any county the recommendations on whether to measure and whether to remediate depend only on the house type: whether the house has a basement and whether the basement is used as living space. Apparent discontinuities across the boundaries of Utah and South Carolina arise from irregularities in the radon measurements from the radon surveys conducted by those states, an issue we ignore here.
- Aggregate consequences of individual decisions
- Figure 9.7 Expected lives saved vs. expected cost for various radon measurement/remediation strategies. Numbers indicate values of Raction. The solid line is for the recommended strategy of measuring only certain homes; the others assume that all homes are measured. All results are estimated totals for the U.S. over a 30-year period.
- Advanced Computation
- Chapter 10 Introduction to Bayesian computation
- Normalized and unnormalized densities
- Log densities
- 10.1 Numerical integration
- Simulation methods
- Deterministic methods
- 10.2 Distributional approximations
- Crude estimation by ignoring some information
- 10.3 Direct simulation and rejection sampling
- Direct approximation by calculating at a grid of points
- Figure 10.1 Illustration of rejection sampling. The top curve is an approximation function, Mg(θ), and the bottom curve is the target density, p(θ|y). As required, Mg(θ) ≥ p(θ|y) for all θ. The vertical line indicates a single random draw θ from the density proportional to g. The probability that a sampled draw θ is accepted is the ratio of the height of the lower curve to the height of the higher curve at the value θ.
- Simulating from predictive distributions
- Rejection sampling
- Direct approximation by calculating at a grid of points
- 10.4 Importance sampling
- Accuracy and efficiency of importance sampling estimates
- Importance resampling
- Uses of importance sampling in Bayesian computation
- 10.5 How many simulation draws are needed?
- Example. Educational testing experiments
- 10.6 Computing environments
- The Bugs family of programs
- Stan
- Other Bayesian software
- 10.7 Debugging Bayesian computing
- Debugging using fake data
- Model checking and convergence checking as debugging
- 10.8 Bibliographic note
- 10.9 Exercises
- Chapter 11 Basics of Markov chain simulation
- Figure 11.1 Five independent sequences of a Markov chain simulation for the bivariate unit normal distribution, with overdispersed starting points indicated by solid squares. (a) After 50 iterations, the sequences are still far from convergence. (b) After 1000 iterations, the sequences are nearer to convergence. Figure (c) shows the iterates from the second halves of the sequences; these represent a set of (correlated) draws from the target distribution. The points in Figure (c) have been jittered so that steps in which the random walks stood still are not hidden. The simulation is a Metropolis algorithm described in the example on page 278, with a jumping rule that has purposely been chosen to be inefficient so that the chains will move slowly and their random-walk-like aspect will be apparent.
- 11.1 Gibbs sampler
- Figure 11.2 Four independent sequences of the Gibbs sampler for a bivariate normal distribution with correlation ρ = 0.8, with overdispersed starting points indicated by solid squares. (a) First 10 iterations, showing the componentwise updating of the Gibbs iterations. (b) After 500 iterations, the sequences have reached approximate convergence. Figure (c) shows the points from the second halves of the sequences, representing a set of correlated draws from the target distribution.
- Example. Bivariate normal distribution
- 11.2 Metropolis and Metropolis-Hastings algorithms
- The Metropolis algorithm
- Example. Bivariate unit normal density with normal jumping kernel
- Relation to optimization
- Why does the Metropolis algorithm work?
- The Metropolis-Hastings algorithm
- Relation between the jumping rule and efficiency of simulations
- The Metropolis algorithm
- 11.3 Using Gibbs and Metropolis as building blocks
- Interpretation of the Gibbs sampler as a special case of the Metropolis-Hastings algorithm
- Gibbs sampler with approximations
- 11.4 Inference and assessing convergence
- Difficulties of inference from iterative simulation
- Discarding early iterations of the simulation runs
- Dependence of the iterations in each sequence
- Figure 11.3 Examples of two challenges in assessing convergence of iterative simulations. (a) In the left plot, either sequence alone looks stable, but the juxtaposition makes it clear that they have not converged to a common distribution. (b) In the right plot, the twosequences happen to cover a common distribution but neither sequence appears stationary. These graphs demonstrate the need to use between-sequence and also within-sequence information when assessing convergence.
- Multiple sequences with overdispersed starting points
- Monitoring scalar estimands
- Challenges of monitoring convergence: mixing and stationarity
- Splitting each saved sequence into two parts
- Assessing mixing using between- and within-sequence variances
- Table 11.1 95% central intervals and estimated potential scale reduction factors for three scalar summaries of the bivariate normal distribution simulated using a Metropolis algorithm. (For demonstration purposes, the jumping scale of the Metropolis algorithm was purposely set to be inefficient; see Figure 11.1.) Displayed are inferences from the second halves of five parallel sequences, stopping after 50, 500, 2000, and 5000 iterations. The intervals for ∞ are taken from the known normal and marginal distributions for these summaries in the target distribution.
- Example. Bivariate unit normal density with bivariate normal jumping kernel (continued)
- 11.5 Effective number of simulation draws
- Bounded or long-tailed distributions
- Stopping the simulations
- Table 11.2 Coagulation time in seconds for blood drawn from 24 animals randomly allocated to four different diets. Different treatments have different numbers of observations because the randomization was unrestricted. From Box, Hunter, and Hunter (1978), who adjusted the data so that the averages are integers, a complication we ignore in our analysis.
- Data from a small experiment
- The model
- Starting points
- Gibbs sampler
- Table 11.3 Summary of inference for the coagulation example. Posterior quantiles and estimated potential scale reductions are computed from the second halves of ten Gibbs sampler sequences, each of length 100. Potential scale reductions for σ and τ are computed on the log scale. The hierarchical standard deviation, τ, is estimated less precisely than the unit-level standard deviation, σ, as is typical in hierarchical modeling with a small number of batches.
- Numerical results with the coagulation data
- The Metropolis algorithm
- Metropolis results with the coagulation data
- Table 11.4: Quality control measurements from 6 machines in a factory.
- 12.1 Efficient Gibbs samplers
- Transformations and reparameterization
- Auxiliary variables
- Example. Modeling the t distribution as a mixture of normals
- Parameter expansion
- Example. Fitting the t model (continued)
- Adaptive algorithms
- Slice sampling
- Reversible jump sampling for moving between spaces of differing dimensions
- Example. Testing a variance component in a logistic regression
- Simulated tempering and parallel tempering
- Particle filtering, weighting, and genetic algorithms
- The momentum distribution, p(φ)
- The three steps of an HMC iteration
- Restricted parameters and areas of zero posterior density
- Setting the tuning parameters
- Varying the tuning parameters during the run
- Locally adaptive HMC
- Combining HMC with Gibbs sampling
- Transforming to log τ
- Entering the data and model
- Setting tuning parameters in the warm-up phase
- No-U-turn sampler
- Inferences and postprocessing
- 13.1 Finding posterior modes
- Conditional maximization
- Newton's method
- Quasi-Newton and conjugate gradient methods
- Numerical computation of derivatives
- 13.2 Boundary-avoiding priors for modal summaries
- Posterior modes on the boundary of parameter space
- Figure 13.1 Marginal posterior density, p(τ|y), for the standard deviation of the population of school effects θj in the educational testing example. If we were to choose to summarize this distribution by its mode, we would be in the uncomfortable position of setting an estimate on the boundary of parameter space.
- Figure 13.2 From a simple one-dimensional hierarchical model with scale parameter 0.5 and data in 10 groups: (a) Sampling distribution of the marginal posterior mode of τ under a uniform prior distribution, based on 1000 simulations of data from the model. (b) 100 simulations of the marginal likelihood, p(y|τ). In this example, the point estimate is noisy and the likelihood function is not very informative about τ.
- Figure 13.3 Various possible zero-avoiding prior densities for τ, the group-level standard deviation parameter in the 8 schools example. We prefer the gamma with 2 degrees of freedom, which hits zero at τ = 0 (thus ensuring a nonzero posterior mode) but clears zero for any positive τ. In contrast, the lognormal and inverse-gamma priors effectively shut off τ in some positive region near zero, or rule out high values of τ. These are behaviors we do not want in a default prior distribution. All these priors are intended for use in constructing penalized likelihood (posterior mode) estimates; if we were doing full Bayes and averaging over the posterior distribution of τ, we would be happy with a uniform or half-Cauchy prior density, as discussed in Section 5.7.
- Zero-avoiding prior distribution for a group-level variance parameter
- Boundary-avoiding prior distribution for a correlation parameter
- Figure 13.4 From a simulated varying-intercept, varying-slope hierarchical regression with identity group-level covariance matrix: (a) Sampling distribution of the maximum marginal likelihood estimate of the group-level correlation parameter, based on 1000 simulations of data from the model. (b) 100 simulations of the marginal profile likelihood, Lprofile(ρ|y) = maxτ1,τ2 p(y|τ1,τ2, ρ). In this example, the maximum marginal likelihood estimate is extremely variable and the likelihood function is not very informative about ρ. (In some cases, the profile likelihood for ρ is flat in some places; this occurs when the corresponding estimate of one of the variance parameters (τ1 or τ2) is zero, in which case ρ is not identified.)
- Degeneracy-avoiding prior distribution for a covariance matrix
- Posterior modes on the boundary of parameter space
- 13.3 Normal and related mixture approximations
- Fitting multivariate normal densities based on the curvature at the modes
- Laplace's method for analytic approximation of integrals
- Mixture approximation for multimodal densities
- Multivariate t approximation instead of the normal
- Sampling from the approximate posterior distributions
- 13.4 Finding marginal posterior modes using EM
- Derivation of the EM and generalized EM algorithms
- Implementation of the EM algorithm
- Example. Normal distribution with unknown mean and variance and partially conjugate prior distribution
- Extensions of the EM algorithm
- Supplemented EM and ECM algorithms
- Parameter-expanded EM (PX-EM)
- 13.5 Approximating conditional and marginal posterior densities
- Approximating the conditional posterior density, p(γ|φ, y)
- Approximating the marginal posterior density, p(φ|y), using an analytic approximation to p(γ|φ, y)
- 13.6 Example: hierarchical normal model (continued)
- Table 13.1 Convergence of stepwise ascent to a joint posterior mode for the coagulation example. The joint posterior density increases at each conditional maximization step, as it should. The posterior mode is in terms of log σ and log τ, but these values are transformed back to the original scale for display in the table.
- Crude initial parameter estimates
- Conditional maximization to find the joint mode of p(θ, μ, log σ, log τ|y)
- Factoring into conditional and marginal posterior densities
- Finding the marginal posterior mode of p(μ, log σ, log τ|y) using EM
- Table 13.2 Convergence of the EM algorithm to the marginal posterior mode of (μ, log σ, log τ) for the coagulation example. The marginal posterior density increases at each EM iteration, as it should. The posterior mode is in terms of log σ and log τ, but these values are transformed back to the original scale for display in the table.
- Table 13.3 Summary of posterior simulations for the coagulation example, based on draws from the normal approximation to p(μ, log σ, log τ|y) and the exact conditional posterior distribution, p(θ|μ, log σ, log τ, y). Compare to joint and marginal modes in Tables 13.1 and 13.2.
- Constructing an approximation to the joint posterior distribution
- Comparison to other computations
- 13.7 Variational inference
- Minimization of Kullback-Leibler divergence
- The class of approximate distributions
- The variational Bayes algorithm
- Example. Educational testing experiments
- Figure 13.5 Progress of variational Bayes for the parameters governing the variational approximation for the hierarchical model for the 8 schools. After a random starting point, the parameters require about 50 iterations to reach approximate convergence. The lower-right graph shows the Kullback-Leibler divergence KL(g||p) (calculated up to an arbitrary additive constant); KL(g||p) is guaranteed to uniformly decrease if the variational algorithm is programmed correctly.
- Figure 13.6 Progress of inferences for the effects in schools A, B, and C, for 100 iterations of variational Bayes. The lines and shaded regions show the median, 50% interval, and 90% interval for the variational distribution. Shown to the right of each graph are the corresponding quantiles for the full Bayes inference as computed via simulation.
- Example. Educational testing experiments
- Proof that each step of variational Bayes decreases the Kullback-Leibler divergence
- Model checking
- Variational Bayes followed by importance sampling or particle filtering
- EM as a special case of variational Bayes
- More general forms of variational Bayes
- Expectation propagation for logistic regression
- Example. Bioassay logistic regression with two coefficients
- Figure 13.7 Progress of expectation propagation for a simple logistic regression with intercept and slope parameters. The bivariate normal approximating distribution is characterized by a mean and standard deviation in each dimension and a correlation. The algorithm reached approximate convergence after 4 iterations.
- Figure 13.8 (a) Progress of the normal approximating distribution during the iterations of expectation propagation. The small ellipse at the bottom (which is actually a circle if x and y axes are placed on a common scale) is the starting distribution; after a few iterations the algorithm converges. (b) Comparison of the approximating distribution from EP (solid ellipse) to the simple approximation based on the curvature at the posterior mode (dotted ellipse) and the exact posterior density (dashed oval). The exact distribution is not normal so the EP approximation is not perfect, but it is closer than the mode-based approximation. All curves show contour lines for the density at 0.05 times the mode (which for the normal distribution contains approximately 95% of the probability mass; see discussion on page 85).
- Extensions of expectation propagation
- Integrated nested Laplace approximation (INLA)
- Central composite design integration (CCD)
- Approximate Bayesian computation (ABC)
- Posterior computations involving an unknown normalizing factor
- Bridge and path sampling
- Regression Models
- Chapter 14 Introduction to regression models
- 14.1 Conditional modeling
- Notation
- Formal Bayesian justification of conditional modeling
- 14.2 Bayesian analysis of the classical regression model
- Notation and basic model
- The standard noninformative prior distribution
- The posterior distribution
- Sampling from the posterior distribution
- The posterior predictive distribution for new data
- Model checking and robustness
- 14.3 Regression for causal inference: incumbency in congressional elections
- Units of analysis, outcome, and treatment variables
- Figure 14.1 U.S. congressional elections: Democratic proportion of the vote in contested districts in 1986 and 1988. Dots and circles indicate districts that in 1988 had incumbents running and open seats, respectively. Points on the left and right halves of the graph correspond to the incumbent party being Republican or Democratic.
- Setting up control variables so that data collection is approximately ignorable
- Implicit ignorability assumption
- Transformations
- Figure 14.2 Incumbency advantage over time: posterior median and 95% interval for each election year. The inference for each year is based on a separate regression. As an example, the results from the regression for 1988, based on the data in Figure 14.1, are displayed in Table 14.1.
- Posterior inference
- Model checking and sensitivity analysis
- Table 14.1 Inferences for parameters in the regression estimating the incumbency advantage in 1988. The outcome variable is the incumbent party's share of the two-party vote in 1988, and only districts that were contested by both parties in both 1986 and 1988 were included. The parameter of interest is the coefficient of incumbency. Data are displayed in Figure 14.1. The posterior median and 95% interval for the coefficient of incumbency correspond to the bar for 1988 in Figure 14.2.
- Figure 14.3 Standardized residuals, from the incumbency advantage regressions for the 1980s, vs. Democratic vote in the previous election. (The subscript t indexes the election years.) Dots and circles indicate district elections with incumbents running and open seats, respectively.
- Table 14.2 Summary of district elections that are ‘outliers’ (defined as having absolute (unstandardized) residuals from the regression model of more than 0.2) for the incumbency advantage example. Elections are classified as open seats or incumbent running; for each category, the observed proportion of outliers is compared to the posterior predictive distribution. Both observed proportions are far higher than expected under the model.
- Units of analysis, outcome, and treatment variables
- 14.1 Conditional modeling
- 14.4 Goals of regression analysis
- Predicting y from × for new observations
- Causal inference
- Do not control for post-treatment variables when estimating the causal effect.
- 14.5 Assembling the matrix of explanatory variables
- Identifiability and collinearity
- Nonlinear relations
- Indicator variables
- Categorical and continuous variables
- Interactions
- Controlling for irrelevant variables
- Selecting the explanatory variables
- 14.6 Regularization and dimension reduction for multiple predictors
- Lasso
- 14.7 Unequal variances and correlations
- Modeling unequal variances and correlated errors
- Bayesian regression with a known covariance matrix
- Bayesian regression with unknown covariance matrix
- Variance matrix known up to a scalar factor
- Weighted linear regression
- Parametric models for unequal variances
- Estimating several unknown variance parameters
- Example. Estimating the incumbency advantage (continued)
- Example. Estimating the incumbency advantage (continued)
- Figure 14.4 Posterior medians of standard deviations σ1 and σ2 for elections with incumbents (solid line) and open-seat elections (dotted line), 1898–1990, estimated from the model with two variance components. (These years are slightly different from those in Figure 14.2 because this model was fit to a slightly different dataset.)
- General models for unequal variances
- Coding prior information on a regression parameter as an extra ‘data point'
- Interpreting prior information on several coefficients as several additional ‘data points'
- Prior information about variance parameters
- Prior information in the form of inequality constraints on parameters
- Table 14.3 Data from the earliest study of metabolic rate and body surface area, measured on a set of dogs. From Schmidt-Nielsen (1984, p. 78).
- 15.1 Regression coefficients exchangeable in batches
- Simple varying-coefficients model
- Intraclass correlation
- Mixed-effects model
- Several sets of varying coefficients
- Exchangeability
- 15.2 Example: forecasting U.S. presidential elections
- Unit of analysis and outcome variable
- Figure 15.1 (a) Democratic share of the two-party vote for president, for each state, in 1984 and 1988. (b) Democratic share of the two-party vote for president, for each state, in 1972 and 1976.
- Preliminary graphical analysis
- Fitting a preliminary, nonhierarchical, regression model
- Table 15.1 Variables used for forecasting U.S. presidential elections. Sample minima, medians, and maxima come from the 511 data points. All variables are signed so that an increase in a variable would be expected to increase the Democratic share of the vote in a state. ‘Inc’ is defined to be +1 or −1 depending on whether the incumbent President is a Democrat or a Republican. ‘Presinc’ equals Inc if the incumbent President is running for reelection and 0 otherwise. ‘Dem. share of state vote’ in last election and two elections ago are coded as deviations from the corresponding national votes, to allow for a better approximation to prior independence among the regression coefficients. ‘Proportion Catholic’ is the deviation from the average proportion in 1960, the only year in which a Catholic ran for President. See Gelman and King (1993) and Boscardin and Gelman (1996) for details on the other variables, including a discussion of the regional/subregional variables. When fitting the hierarchical model, we also included indicators for years and regions within years.
- Checking the preliminary regression model
- Figure 15.2 Scatterplot showing the joint distribution of simulation draws of the realized test quantity, T(y, β)—the square root of the average of the 11 squared nationwide residuals—and its hypothetical replication, T(yrep, β), under the nonhierarchical model for the election forecasting example. The 200 simulated points are far below the 45° line, which means that the realized test quantity is much higher than predicted under the model.
- Extending to a varying-coefficients model
- Forecasting
- Posterior inference
- Figure 15.3 Scatterplot showing the joint distribution of simulation draws of the realized test quantity, T(y, β)—the square root of the average of the 11 squared nationwide residuals—and its hypothetical replication, T(yrep, β), under the hierarchical model for the election forecasting example. The 200 simulated points are scattered evenly about the 45° line, which means that the model accurately fits this particular test quantity.
- Reasons for using a hierarchical model
- Unit of analysis and outcome variable
- 15.3 Interpreting a normal prior distribution as additional data
- Interpretation as a single linear regression
- More than one way to set up a model
- 15.4 Varying intercepts and slopes
- Inverse-Wishart model
- Scaled inverse-Wishart model
- Predicting business school grades for different groups of students
- 15.5 Computation: batching and transformation
- Gibbs sampler, one batch at a time
- All-at-once Gibbs sampler
- Parameter expansion
- Example. Election forecasting (continued)
- Transformations for HMC
- Example. Eight schools model
- Notation and model
- Computation
- Finite-population and superpopulation standard deviations
- Example. Five-way factorial structure for data on Web connect times
- Figure 15.4 Anova display for the World Wide Web data. The bars indicate 50% and 95% intervals for the finite-population standard deviations sm. The display makes apparent the magnitudes and uncertainties of the different components of variation. Since the data are on the logarithmic scale, the standard deviation parameters can be interpreted directly. For example, sm = 0.20 corresponds to a coefficient of variation of exp (0.2) − 1 ≈ 0.2 on the original scale, and so the exponentiated coefficients in this batch correspond to multiplicative increases or decreases in the range of 20%. (The dots on the bars show simple classical estimates of the variance components that can be used as starting points in a Bayesian computation.)
- Example. Five-way factorial structure for data on Web connect times
- Figure 15.5 Posterior medians, 50%, and 95% intervals for standard deviation parameters σk estimated from a split-plot latin square experiment. (a) The left plot shows inferences given uniform prior distributions on the σk's. (b) The right plot shows inferences given a hierarchical half-Cauchy model with scale fit to the data. The half-Cauchy model gives much sharper inferences, using the partial pooling that comes with fitting a hierarchical model.
- Superpopulation and finite-population standard deviations
- Figure 15.6 Posterior medians, 50%, and 95% intervals for finite-population standard deviations sk estimated from a split-plot latin square experiment. (a) The left plot shows inferences given uniform prior distributions on the σk's. (b) The right plot shows inferences given a hierarchical half-Cauchy model with scale fit to the data. The half-Cauchy model gives sharper estimates even for these finite-population standard deviations, indicating the power of hierarchical modeling for these highly uncertain quantities. Compare to Figure 15.5 (which is on a different scale).
- Table 15.2 Data from a chemical experiment, from Marquardt and Snee (1975). The first three variables are experimental manipulations, and the fourth is the outcome measurement.
- 16.1 Standard generalized linear model likelihoods
- Continuous data
- Poisson
- Binomial
- Overdispersed models
- 16.2 Working with generalized linear models
- Canonical link functions
- Offsets
- Interpreting the model parameters
- Understanding discrete-data models in terms of latent continuous data
- Bayesian nonhierarchical and hierarchical generalized linear models
- Noninformative prior distributions on β
- Conjugate prior distributions
- Nonconjugate prior distributions
- Hierarchical models
- Normal approximation to the likelihood
- Example. The binomial-logistic model
- Approximate normal posterior distribution
- More advanced computational methods
- 16.3 Weakly informative priors for logistic regression
- The problem of separation
- Example. Predicting vote from sex, ethnicity, and income
- Table 16.1 Estimates and standard errors from logistic regressions (with uniform prior distributions) predicting Republican vote intention in pre-election polls, fit separately to survey data from four presidential elections from 1960 through 1972. The estimates are reasonable except in 1964, where there is complete separation (with none of black respondents supporting the Republican candidate, Barry Goldwater).
- Figure 16.1 Profile likelihood (in this case, essentially the posterior distribution given a uniform prior distribution) of the coefficient of black from the logistic regression of Republican vote in 1964 (displayed in the lower left of Table 16.1), conditional on point estimates of the other coefficients in the model. The maximum occurs as β → −∞, indicating that the best fit to the data would occur at this unreasonable limit.
- Example. Predicting vote from sex, ethnicity, and income
- The problem of separation
- Computation with a specified normal prior distribution
- Approximate EM algorithm with a t prior distribution
- Default prior distribution for logistic regression coefficients
- Figure 16.2 (solid line) Cauchy density with scale 2.5, (dashed line) t7 density with scale 2.5, (dotted line) likelihood for θ corresponding to a single binomial trial of probability logit−1 (θ) with one-half success and one-half failure. All these curves favor values below 5 in absolute value; we choose the Cauchy as our default model because it allows the occasional probability of larger values.
- Figure 16.3 Estimates from maximum likelihood and Bayesian logistic regression with the recommended default prior distribution for the bioassay example (data in Table 3.1 on page 74). In addition to graphing the fitted curves (at right), we show raw computer output to illustrate how our approach would be used in routine practice. The big change in the estimated coefficient for z.x when going from glm to bayesglm may seem surprising at first, but upon reflection we prefer the second estimate with its lower coefficient for x, which is based on downweighting the most extreme possibilities that are allowed by the likelihood.
- Other models
- Bioassay example
- Example. Predicting voting from ethnicity (continued)
- Weakly informative default prior compared to actual prior information
- Figure 16.4 The left column shows the estimated coefficients (±1 standard error) for a logistic regression predicting probability of Republican vote for president given sex, race, and income, as fit separately to data from the National Election Study for each election 1952 through 2000. (The binary inputs female and black have been centered to have means of zero, and the numerical variable income has been centered and then rescaled by dividing by two standard deviations.) The complete separation in 1964 led to a coefficient estimate of −∞ that year. (The particular finite values of the estimate and standard error are determined by the number of iterations used by the glm function in R before stopping.)The other columns show estimates for the same model fit each year using independent Cauchy, t7, and normal prior distributions, each with center 0 and scale 2.5. All three prior distributions do a reasonable job at stabilizing the estimates for 1964, while leaving the estimates for other years essentially unchanged.
- Aggregate data
- Regression analysis to control for precincts
- Figure 16.5 Estimated rates exp(αe) at which people of different ethnic groups were stopped for different categories of crime, as estimated from hierarchical regressions (16.12) using previous year's arrests as a baseline and controlling for differences between precincts. Separate analyses were done for the precincts that had less than 10%, 10%–40%, and more than 40% black population. For the most common stops—violent crimes and weapons offenses—blacks and hispanics were stopped about twice as often as whites. Rates are plotted on a logarithmic scale.
- Figure 16.6 Anova display for two logistic regression models of the probability that a survey respondent prefers the Republican candidate for the 1988 U.S. presidential election, based on data from seven CBS News polls. Point estimates and error bars show posterior medians, 50% intervals, and 95% intervals of the finite-population standard deviations sm. The demographic factors are those used by CBS to perform their nonresponse adjustments, and states and regions are included because we were interested in estimating average opinions by state. The large effects for ethnicity, region, and state suggest that it might make sense to include interactions, hence the inclusion of the ethnicity × region and ethnicity × state effects in the second model.
- Multivariate outcomes
- Example. Meta-analysis with binomial outcomes
- Table 16.2 Summary of posterior inference for the bivariate analysis of the meta-analysis of the beta-blocker trials in Table 5.4. All effects are on the log-odds scale. Inferences are similar to the results of the univariate analysis of logit differences in Section 5.6: compare the individual study effects to Table 5.4 and the mean and standard deviation of average logits to Table 5.5. ‘Study 1 avg logit’ is included above as a representative of the 22 parameters β1j. (We would generally prefer to display all these inferences graphically but use tables here to give a more detailed view of the posterior inferences.)
- Example. Meta-analysis with binomial outcomes
- Table 16.3 Subset of the data from the 1988–1989 World Cup of chess: results of games between eight of the 29 players. Results are given as wins, losses, and draws; for example, when playing with the white pieces against Kasparov, Karpov had one win, no losses, and one draw. For simplicity, this table aggregates data from all six tournaments.
- Example. World Cup chess
- The Poisson or multinomial likelihood
- Setting up the matrix of explanatory variables
- Prior distributions
- Computation
- 17.1 Aspects of robustness
- Robustness of inferences to outliers
- Sensitivity analysis
- 17.2 Overdispersed versions of standard probability models
- The t distribution in place of the normal
- Negative binomial alternative to Poisson
- Beta-binomial alternative to binomial
- The t distribution alternative to logistic and probit regression
- Why ever use a nonrobust model?
- 17.3 Posterior inference and computation
- Notation for robust model as expansion of a simpler model
- Gibbs sampling using the mixture formulation
- Sampling from the posterior predictive distribution for new data
- Computing the marginal posterior distribution of the hyperparameters by importance weighting
- Approximating the robust posterior distributions by importance resampling
- 17.4 Robust inference and sensitivity analysis for the eight schools
- Robust inference based on a t4 population distribution
- Table 17.1 Summary of 2500 simulations of the treatment effects in the eight schools, using the t4 population distribution in place of the normal. Results are similar to those obtained under the normal model and displayed in Table 5.3.
- Sensitivity analysis based on tν distributions with varying values of ν
- Figure 17.1 Posterior means and standard deviations of treatment effects as functions of ν, on the scale of 1/ν, for the sensitivity analysis of the educational testing example. The values at 1/ν=0 come from the simulations under the normal distribution in Section 5.5. Much of the scatter in the graphs is due to simulation variability.
- Treating ν as an unknown parameter
- Figure 17.2 Posterior simulations of 1/ν from the Gibbs-Metropolis computation of the robust model for the educational testing example, with ν treated as unknown.
- Discussion
- Robust inference based on a t4 population distribution
- 17.5 Robust regression using t-distributed errors
- Iterative weighted linear regression and the EM algorithm
- Gibbs sampler and Metropolis algorithm
- 17.6 Bibliographic note
- 17.7 Exercises
- Table 17.2 Observed distribution of the word ‘may’ in papers of Hamilton and Madison, from Mosteller and Wallace (1964). Out of the 247 blocks of Hamilton's text studied, 128 had no instances of ‘may,’ 67 had one instance of ‘may,’ and so forth, and similarly for Madison.
- 18.1 Notation
- 18.2 Multiple imputation
- Computation using EM and data augmentation
- Inference with multiple imputations
- 18.3 Missing data in the multivariate normal and t models
- Finding posterior modes using EM
- Drawing samples from the posterior distribution of the model parameters
- Extending the normal model using the t distribution
- Nonignorable models
- 18.4 Example: multiple imputation for a series of polls
- Background
- Multivariate missing-data framework
- A hierarchical model for multiple surveys
- Use of the continuous model for discrete responses
- Figure 18.1 Approximate monotone data pattern for 51 polls conducted during the 1988 U.S. presidential election campaign. Not all questions were asked in all surveys.
- Computation
- Accounting for survey design and weights
- Results
- Figure 18.2 Estimates and standard error bars for the population mean response for two questions: (a) income (in thousands of dollars), and (b) perceived ideology of Dukakis (on a 1–7 scale from liberal to conservative), over time. Each symbol represents a different survey, with different letters indicating different survey organizations. The size of the letter indicates the number of responses to the question, with large-sized letters for surveys with nearly complete response and small-sized letters for surveys with few responses. Circled letters indicate surveys for which the question was not asked; the estimates for these surveys have much larger standard errors. The inner brackets on the vertical bars show the within-imputation standard deviation for the average from each poll.
- Table 18.1 3 × 3 × 3 table of results of 1990 preplebiscite survey in Slovenia, from Rubin, Stern, and Vehovar (1995). We treat ‘don't know’ responses as missing data. Of most interest is the proportion of the electorate whose ‘true’ answers are ‘yes’ on both ‘independence’ and ‘attendance.'
- Crude estimates
- The likelihood and prior distribution
- The model for the ‘missing data'
- Using the EM algorithm to find the posterior mode of θ
- Using SEM to estimate the posterior variance matrix and obtain a normal approximation
- Multiple imputation using data augmentation
- Posterior inference for the estimand of interest
- Nonlinear and Nonparametric Models
- Chapter 19 Parametric nonlinear models
- 19.1 Example: serial dilution assay
- Figure 19.1 Typical setup of a plate with 96 wells for a serial dilution assay. The first two columns are dilutions of ‘standards’ with known concentrations, and the other columns are ten different ‘unknowns.’ The goal of the assay is to estimate the concentrations of the unknowns, using the standards as calibration.
- Figure 19.2 Data from a single plate of a serial dilution assay. The large graph shows the calibration data, and the ten small graphs show data for the unknown compounds. The goal of the analysis is to figure out how to scale the x-axes of the unknowns so they line up with the curve estimated from the standards. (The curves shown on these graphs are estimated from the model as described in Section 19.1.)
- Laboratory data
- Figure 19.3 Example of measurements y from a plate as analyzed by standard software used for dilution assays. The standards data are used to estimate the calibration curve, which is then used to estimate the unknown concentrations. The concentrations indicated by asterisks are labeled as ‘below detection limit.’ However, information is present in these low observations, as can be seen by noting the decreasing pattern of the measurements from dilutions 1 to to in each sample.
- The model
- Inference
- Figure 19.4 Posterior medians, 50% intervals, and 95% intervals for the concentrations of the 10 unknowns for the data displayed in Figure 19.2. Estimates are obtained for all the samples, even Unknown 8, all of whose data were ‘below detection limit’ (see Figure 19.3).
- Figure 19.5 Standardized residuals (yi − E(yi|xi))/sd(yi|xi)) vs. expected values E(yi|xi), for the model fit to standards and unknown data from a single plate. Circles and crosses indicate measurements from standards and unknowns, respectively. No major problems appear with the model fit.
- Comparison to existing estimates
- Figure 19.6 Estimated fraction of PERC metabolized, as a function of steady-state concentration in inhaled air, for 10 hypothetical individuals randomly selected from the estimated population of young adult white males.
- 19.1 Example: serial dilution assay
- 19.2 Example: population toxicokinetics
- Background
- Figure 19.7 Concentration of PERC (in milligrams per liter) in exhaled air and in blood, over time, for one of two replications in each of six experimental subjects. The measurements are displayed on logarithmic scales.
- Toxicokinetic model
- Difficulties in estimation and the role of prior information
- Measurement model
- Population model for parameters
- Prior information
- Joint posterior distribution for the hierarchical model
- Computation
- Inference for quantities of interest
- Figure 19.8 Posterior inferences for the quantities of interest—the fraction metabolized at high and low exposures—for each of the six subjects in the PERC experiment. The scatter within each plot represents posterior uncertainty about each person's metabolism. The variation among these six persons represents variation in the population studied of young adult white males.
- Evaluating the fit of the model
- Figure 19.9 Observed PERC concentrations (for all individuals in the study) divided by expected concentrations, plotted vs. expected concentrations. The x and y-axes are on different (logarithmic) scales: observations vary by a factor of 10,000, but the relative errors are mostly between 0.8 and 1.25. Because the expected concentrations are computed based on a random draw of the parameters from their posterior distribution, the figure shows the actual misfit estimated by the model, without the need to adjust for fitting.
- Figure 19.10 External validation data and 95% predictive intervals from the model fit to the PERC data. The model predictions fit the data reasonably well but not in the first 15 minutes of exposure, a problem we attribute to the fact that the model assumes that all compartments are in instantaneous equilibrium, whereas this actually takes about 15 minutes to approximately hold.
- Use of a complex model with an informative prior distribution
- Background
- 19.3 Bibliographic note
- Table 19.1 Number of attempts and successes of golf putts, by distance from the hole, for a sample of professional golfers. From Berry (1996)1.
- 19.4 Exercises
- 20.1 Splines and weighted sums of basis functions
- Figure 20.1 Single Gaussian (solid line) and cubic B-spline (dashed line) basis functions scaled to have the same width. The X marks the center of the Gaussian basis function, and the circles mark the location of knots for the cubic B-spline.
- Figure 20.2 (a) A set of cubic B-splines with equally spaced knots. (b) A set of random draws from the B-spline prior for μ(x) based on the basis functions in the left graph, assuming independent standard normal priors for the basis coefficients.
- Figure 20.3 A small dataset of concentration of chloride over time in a biology experiment. Data points are circles, the linear regression estimate is shown with a dotted line, and the posterior mean curve using B-splines is the curved solid line.
- Example. Chloride concentration
- 20.2 Basis selection and shrinkage of coefficients
- Example. Chloride concentration (continued)
- Shrinkage priors
- 20.3 Non-normal models and multivariate regression surfaces
- Other error distributions
- Example. Chloride concentration (continued)
- Multivariate regression surfaces
- Example. A nonparametric regression function that is constrained to be nondecreasing
- Figure 20.4 Estimated probability of preterm birth as a function of DDE dose. The solid line is the posterior mean based on a Bayesian nonparametric regression constrained to be nondecreasing, and the dashed lines are 95% posterior intervals for the probability at each point. The dotted line is the maximum likelihood estimate for the unconstrained generalized additive model.
- Example. A nonparametric regression function that is constrained to be nondecreasing
- Other error distributions
- Figure 20.5 Proportion of survey respondents who reported knowing someone gay, and who supported a law allowing same-sex marriage, as a function of age. Can you fit curves through these points using splines or Gaussian processes?
- 21.1 Gaussian process regression
- Figure 21.1 Random draws from the Gaussian process prior with squared exponential covariance function and different values of the amplitude parameter τ and the length scale parameter l.
- Covariance functions
- Inference
- Covariance function approximations
- Figure 21.2 Posterior draws of a Gaussian process μ(x) fit to ten data points, conditional on three different choices of the parameters τ, l that characterize the process. Compare to Figure 21.1, which shows draws of the curve from the prior distribution of each model. In our usual analysis, we would assign a prior distribution to τ, l and then perform joint posterior inference for these parameters along with the curve μ(x); see Figure 21.3. We show these three choices of conditional posterior distribution here to give a sense of the role of τ, l in posterior inference.
- Marginal likelihood and posterior
- Figure 21.3 Marginal posterior distributions for Gaussian process parameters τ, l and error scale σ, and posterior mean and pointwise 90% bands for μ(x), given the same ten data points from Figure 21.2.
- Decomposing the time series as a sum of Gaussian processes
- Figure 21.4 Relative number of births in the United States based on exact data from each day from 1969 through 1988, divided into different components, each with an additive Gaussian process model. The estimates from an improved model are shown in Figure 21.5.
- An improved model
- Figure 21.5 Relative number of births in the United States based on exact data from each day from 1969 through 1988, divided into different components, each with an additive Gaussian process model. Compared to Figure 21.4, this improved model allows individual effects for every day of the year, not merely for a few selected dates.
- Example. Leukemia survival times
- Figure 21.6 For the leukemia example, estimated conditional comparison for each predictor with other predictors fixed to their mean values or defined values. The thick line in each graph is the posterior median estimated using a Gaussian process model, and the thin lines represent pointwise 90% intervals.
- Density estimation
- Figure 21.7 Two simple examples of density estimation using Gaussian processes. Left column shows acidity data and right column shows galaxy data. Top row shows histograms and bottom row shows logistic Gaussian process density estimate means and 90% pointwise posterior intervals.
- Example. One-dimensional densities: galaxies and lakes
- Density regression
- Latent-variable regression
- 22.1 Setting up and interpreting mixture models
- Finite mixtures
- Continuous mixtures
- Identifiability of the mixture likelihood
- Prior distribution
- Ensuring a proper posterior distribution
- Number of mixture components
- More general formulation
- Mixtures as true models or approximating distributions
- Basics of computation for mixture models
- Crude estimates
- Posterior modes and marginal approximations using EM and variational Bayes
- Posterior simulation using the Gibbs sampler
- Posterior inference
- 22.2 Example: reaction times and schizophrenia
- Initial statistical model
- Figure 22.1 Logarithms of response times (in milliseconds) for 11 non-schizophrenic individuals (above) and 6 schizophrenic individuals (below). All histograms are on a common scale, and there are 30 measurements for each person.
- Crude estimate of the parameters
- Finding the modes of the posterior distribution using ECM
- Normal and t approximations at the major mode
- Simulation using the Gibbs sampler
- Table 22.1 Posterior quantiles for parameters of interest under the old and new mixture models for the reaction time experiment. Introducing the new mixture parameter ω, which represents the proportion of schizophrenics with attentional delays, changes the interpretation of the other parameters in the model.
- Possible difficulties at a degenerate point
- Inference from the iterative simulations
- Posterior predictive distributions
- Checking the model
- Expanding the model
- Figure 22.2 Scatterplot of the posterior predictive distribution of two test quantities: the smallest and largest observed within-schizophrenic variances. The x represents the observed value of the test quantity in the dataset.
- Checking the new model
- Figure 22.3 Scatterplot of the posterior predictive distribution, under the expanded model, of two test quantities: the smallest and largest within-schizophrenic variance. The x represents the observed value of the test quantity in the dataset.
- Initial statistical model
- Figure 22.4 Histograms overlain with nonparametric density estimates. Top row shows galaxy data, bottom row shows acidity data. The three columns show Gaussian kernel density estimation, estimated densities, and estimated clusters.
- Example. Simple mixtures fit to small datasets
- Table 22.2 Posterior mean and standard deviation of weight, location, and scale parameters for the five mixture components fit to the galaxy data displayed in the top row of Figure 22.4.
- Table 22.3 Posterior mean and standard deviation of weight, location, and scale parameters for the five mixture components fit to the acidity data displayed in the bottom row of Figure 22.4.
- Table 22.4 terior mean and standard deviation of weight, location, and scale parameters for the six mixture components fit to the iris data, in which each data point is characterized by four continuous predictors.
- Classification
- Regression
- 23.1 Bayesian histograms
- 23.2 Dirichlet process prior distributions
- Definition and basic properties
- Stick-breaking construction
- Figure 23.1 Samples from the stick-breaking representation of the Dirichlet process with different settings of the precision parameter α.
- Specification and Polya urns
- Blocked Gibbs sampler
- Hyperprior distribution
- Example. A toxicology application
- Figure 23.2 Histogram of the number of implantations per pregnant mouse in the control group (black line) and posterior mean of Pr(y = j) assuming a Dirichlet process prior on the distribution of the number of implants with α = 1, 5 (gray and black dotted lines, respectively) and base measure
- Figure 23.3 Histogram of a subsample of size 10 from the control group on implantation in mice (black line) and posterior mean of Pr(y = j) assuming a DP prior on the distribution of the number of implants with α = 1, 5 (gray and black dotted lines, respectively) and base measure
- Example. A toxicology application
- Nonparametric residual distributions
- Nonparametric models for parameters that vary by group
- Functional data analysis
- Example. A genotoxicity application
- Figure 23.4 Histograms and kernel-smoothed density estimates of DNA damage across cells in each hydrogen peroxide dose group in the genotoxicity example.
- Dependent Dirichlet processes
- Example. Genotoxicity application (continued)
- Figure 23.5 Directed graph illustrating order restriction in the genotoxicity model. Arrows point toward stochastically larger groups. Posterior probabilities of H1k are also shown.
- Example. Genotoxicity application (continued)
- Figure 23.6 Genotoxicity application. Estimated densities of the Olive tail moment in a subset of the H2O2 dose x repair groups. Solid curves are the posterior mean density estimates and dashed curves provide pointwise 95% credible intervals.
- Dependent stick-breaking processes
- Example. Glucose tolerance prediction
- Figure 23.7 Data from glucose-tolerance study: y = 2-hour glucose level (mg/dl); x1 = insulin sensitivity; x2 = age; x3 = waist to hip ratio; x4 = body-mass index; x5 = diastolic blood pressure; x6 = systolic blood pressure.
- Example. Glucose tolerance prediction
- Figure 23.8 Predictive (dashed) conditional response density p(y|x) and 95% credible intervals (dash-dotted) with normalized x1 (insulin sensitivity) and x2 (age) varying among 5th, 50th, 95th empirical percentiles.
- Appendix A Standard probability distributions
- A.1 Continuous distributions
- Uniform
- Univariate normal
- Table A.1 Continuous distributions
- Table A.2 Discrete distributions
- Lognormal
- Multivariate normal
- Gamma
- Inverse-gamma
- Chi-square
- Inverse chi-square
- Exponential
- Weibull
- Wishart
- Inverse-Wishart
- LKJ correlation
- Beta
- Dirichlet
- Constrained distributions
- A.2 Discrete distributions
- Poisson
- Binomial
- Multinomial
- Negative binomial
- Beta-binomial
- A.3 Bibliographic note
- A.1 Continuous distributions
- Mathematical framework
- Convergence of the posterior distribution for a discrete parameter space
- Convergence of the posterior distribution for a continuous parameter space
- Convergence of the posterior distribution to normality
- Multivariate form
- B.1 Bibliographic note
- C.1 Getting started with R and Stan
- C.2 Fitting a hierarchical model in Stan
- Stan model file
- R script for data input, starting values, and running Stan
- Figure C.1 Numerical output from the print() function applied to the Stan code of the hierarchical model for the educational testing example. For each parameter, mean is the estimated posterior mean (computed as the average of the saved simulation draws), se_mean is the estimated standard error (that is, Monte Carlo uncertainty) of the mean of the simulations, and sd is the standard deviation. Thus, as the number of simulation draws approaches infinity, se_mean approaches zero while sd approaches the posterior standard deviation of the parameter. Then come several quantiles, then the effective sample size neff (formula (11.8) on page 287) and the potential scale reduction factor (see (11.4) on page 285). When all the simulated chains have mixed, Beyond this, the effective sample size and standard errors give a sense of whether the simulations suffice for practical purposes. Each line of the table shows inference for a single scalar parameter in the model, with the last line displaying inference for the unnormalized log posterior density calculated at each step in Stan.
- Accessing the posterior simulations in R
- Posterior predictive simulations and graphs in R
- Alternative prior distributions
- Using the t model
- C.3 Direct simulation, Gibbs, and Metropolis in R
- Marginal and conditional simulation for the normal model
- Gibbs sampler for the normal model
- Gibbs sampling for the t model with fixed degrees of freedom
- Gibbs-Metropolis sampling for the t model with unknown degrees of freedom
- Parameter expansion for the t model
- C.4 Programming Hamiltonian Monte Carlo in R
- C.5 Further comments on computation
- C.6 Bibliographic note
UM RAFBÆKUR Á HEIMKAUP.IS
Bókahillan þín er þitt svæði og þar eru bækurnar þínar geymdar. Þú kemst í bókahilluna þína hvar og hvenær sem er í tölvu eða snjalltæki. Einfalt og þægilegt!Rafbók til eignar
Rafbók til eignar þarf að hlaða niður á þau tæki sem þú vilt nota innan eins árs frá því bókin er keypt.
Þú kemst í bækurnar hvar sem er
Þú getur nálgast allar raf(skóla)bækurnar þínar á einu augabragði, hvar og hvenær sem er í bókahillunni þinni. Engin taska, enginn kyndill og ekkert vesen (hvað þá yfirvigt).
Auðvelt að fletta og leita
Þú getur flakkað milli síðna og kafla eins og þér hentar best og farið beint í ákveðna kafla úr efnisyfirlitinu. Í leitinni finnur þú orð, kafla eða síður í einum smelli.
Glósur og yfirstrikanir
Þú getur auðkennt textabrot með mismunandi litum og skrifað glósur að vild í rafbókina. Þú getur jafnvel séð glósur og yfirstrikanir hjá bekkjarsystkinum og kennara ef þeir leyfa það. Allt á einum stað.
Hvað viltu sjá? / Þú ræður hvernig síðan lítur út
Þú lagar síðuna að þínum þörfum. Stækkaðu eða minnkaðu myndir og texta með multi-level zoom til að sjá síðuna eins og þér hentar best í þínu námi.
Fleiri góðir kostir
- Þú getur prentað síður úr bókinni (innan þeirra marka sem útgefandinn setur)
- Möguleiki á tengingu við annað stafrænt og gagnvirkt efni, svo sem myndbönd eða spurningar úr efninu
- Auðvelt að afrita og líma efni/texta fyrir t.d. heimaverkefni eða ritgerðir
- Styður tækni sem hjálpar nemendum með sjón- eða heyrnarskerðingu
- Gerð : 208
- Höfundur : 6003
- Útgáfuár : 2013
- Leyfi : 379