Interactive SAS tutorials supporting the OpenIntro Introduction to Modern Statistics textbook.
This tutorial combines the statistical inference ideas from the 5th tutorial with the linear regression modeling ideas covered in the 3rd and 4th tutorials. The tutorial starts by understanding how the estimated linear model will change from sample to sample, and you will be able to use that variability to make conclusions about the unknown population linear model. Both interval estimates of the slope and statistical significance will be determined using randomization methods as well as t-based models. Intervals for predicting a single response are shown in contrast to predictions of an average response.
The tutorial will continue using tidy modeling procedures from the tidyverse package, including broom as a way to work with linear model output.
Recalling how and why we perform inferential analyses on any model, we focus on research questions in which populations are described by a linear model. The variability of the estimated linear model will be compared under different sample sizes (for the data) and different population structures.
Using the R package infer, a full randomization test for the statistical significance of the slope parameter is performed. Additionally, a bootstrap analysis is done to find a confidence interval for the slope.
In contrast to the computational methods in the previous lesson, here you will use the t-distribution as a mathematical model for both statistical significance of the slope as well as confidence interval for the slope. The confidence interval representing an average response (at given value of x) is contrasted with a prediction interval representing an individual response (at a given value of x).
In order for the t-based methods (in the previous lesson) to produce meaningful results, it is important to ensure that the data (and underlying population structure) is set-up in a way consistent with the linear model specifications. Fortunately, there are adjustments to the structure (e.g., variable transformations) that can be made so that the technical conditions hold and produce meaningful conclusions (and are described in the following lesson).
Building on the simple linear regression model (i.e., the model with only one explanatory variable), you will transform variables to produce models which are more characteristic of the underlying regression structure. Additionally, multiple variables will be considered, and interpretations of the variables’ coefficients will be examined.