Introduction to data
Scientists seek to answer questions using rigorous methods and careful observations. These observations - collected from the likes of field notes, surveys, and experiments - form the backbone of a statistical investigation and are called data. Statistics is the study of how best to collect, analyze, and draw conclusions from data. It is helpful to put statistics in the context of a general process of investigation:
- Identify a question or problem.
- Collect relevant data on the topic.
- Analyze the data.
- Form a conclusion.
In this tutorial, we focus on steps 1 and 2 of this process. In future tutorials, we will focus on steps 3 and 4 of this process.
Learning objectives
- Internalize the language of data.
- Load and view a dataset in SAS and distinguish between various variable types and data representations.
- Classify a study as observational or experimental, and determine whether the study’s results can be generalized to the population and whether they suggest correlation or causation between the variables studied.
- Distinguish between various sampling strategies and recognize the benefits and drawbacks of choosing one strategy over another.
- Identify the principles of experimental design and recognize their purposes.
Tutorials
- Load data from the course library within the SAS On-Demand environment
- Introduce datasets
- Discuss variable types connecting terminology from textbook to SAS
- Perform basic data manipulation
- Introduce the idea of confounding variables
- Demonstrate Simpson’s Paradox
- Define observational studies and experiments
- Discuss scope of inference 2x2 grid with random assignment and sampling
- Define simple random sampling, stratified sampling, cluster sampling, multistage sampling
- Use SAS to obtain different types of samples
- Discuss benefits and drawbacks of choosing one sampling scheme over another
- Identify and discuss the purpose of each principle of experimental design
- Discuss the purpose of each principle