In mathematics and statistics, linear regression is a method for modeling the relationship between two variables using a straight line — called the line of best fit — that best represents the trend in a scatter plot. The regression equation is written as ŷ = a + bx, where b is the slope (rate of change) and a is the y-intercept (predicted value when x = 0). Linear regression appears in Florida’s MAFS.912.S-ID.6 standards, AP Statistics, and the SAT Math Problem Solving & Data Analysis section.

"Regression" Explained

Plug points into DESMOS table and use regression to get equation.

Linear Regression in Math — Definition, Line of Best Fit & SAT Guide

Formal definition: Linear regression is a statistical method for finding the straight line that best models the relationship between two quantitative variables in a data set. This line — called the regression line or line of best fit — minimizes the total distance between itself and all the data points in the scatter plot. The regression equation predicts the value of the dependent variable (ŷ, pronounced “y-hat”) for any given value of the independent variable (x), making it a tool for analysis and prediction.

Regression

Where you’ll see it: Linear regression appears in Florida Algebra 2 and Pre-Calculus courses, AP Statistics, Florida MAFS.912.S-ID.6 standards governing linear models for data, the SAT Math Problem Solving & Data Analysis section (scatter plots and line of best fit questions), and the ACT Mathematics section. It is the foundation for all predictive statistical modeling in Florida high school curriculum.

The Linear Regression Formula — ŷ = a + bx

The regression equation has the same structure as slope-intercept form (y = mx + b) but with different notation and a specific meaning for each variable. Understanding what each component represents in a real-world context is more important than memorizing the equation — it is the interpretation skill that the SAT, FSA, and MAFS.912.S-ID.6 all test.

Linear Regression Cheat Sheet
Linear Regression Equation
ŷ = a + bx
ŷ (y-hat) = the predicted value of the dependent variable (what we're estimating)
x = the independent variable (the input or predictor value)
b = slope – the predicted change in ŷ for every 1-unit increase in x
a = y-intercept – the predicted value of ŷ when x = 0

Note on notation: some Florida textbooks and the SAT use ŷ = mx + b or ŷ = bx + a – the variables may be arranged differently. Identify slope (the coefficient of x) and y-intercept (the constant term) regardless of notation.

Calculator output: Most Florida students generate regression equations on a TI-84. The calculator displays "y = ax + b" – where a is the slope and b is the y-intercept (opposite of the statistical notation). Always verify which convention your exam uses.

How to Interpret the Slope in Context
Slope (b) = "for every 1-unit increase in x, ŷ increases/decreases by b"

The slope of a regression line is always interpreted in real-world units – not as an abstract number.

Example: If the equation is ŷ = 12.5 + 3.2x, where x = study hours and ŷ = predicted test score:

Slope interpretation: "For every additional hour of studying, the predicted test score increases by 3.2 points."
Y-intercept interpretation: "A student who studies 0 hours is predicted to score 12.5 points."

SAT application: "What does the value 3.2 represent in this context?" – This is the most missed regression question type. The answer is always a rate: "the predicted change in [y-variable] per 1-unit change in [x-variable]."

TI-84 Calculator Notation vs. Statistical Notation
Statistical: ŷ = a + bx (a = intercept, b = slope) | TI-84: y = ax + b (a = slope, b = intercept)

The TI-84's LinReg output swaps a and b compared to AP Statistics textbook notation. This causes systematic errors when students copy calculator output into exam answers.

Rule: Identify slope by its position – the coefficient of x is always slope. The standalone constant is always the y-intercept. This applies regardless of variable name.

Linear Regression — 3 Worked Examples

Linear Regression Example
Example 1 – Read the Equation
Easy
A scatter plot shows a line of best fit with equation ŷ = 4.5 + 2.3x, where x is the number of practice problems completed and ŷ is the predicted quiz score. What is the slope? What is the y-intercept?
Step 1: Identify the slope – the coefficient of x → b = 2.3
Step 2: Identify the y-intercept – the constant term → a = 4.5
Step 3: Interpret slope → for every 1 additional practice problem completed, the predicted quiz score increases by 2.3 points
Step 4: Interpret y-intercept → a student who completes 0 practice problems is predicted to score 4.5 points
Answer: Slope = 2.3 (score increases 2.3 pts per problem) · Y-intercept = 4.5 (predicted score at 0 problems)
Linear Regression Example 2
Example 2 – SAT Context Interpretation
Medium
A scatter plot models the relationship between outdoor temperature (°F) and daily ice cream sales ($). The line of best fit is ŷ = −320 + 15x. What does the value 15 represent in this context? What does −320 represent?
Step 1: Identify 15 → it is the slope (coefficient of x)
Step 2: Interpret slope in context → for every 1°F increase in temperature, daily ice cream sales are predicted to increase by $15
Step 3: Identify −320 → it is the y-intercept (the constant term)
Step 4: Interpret y-intercept in context → at 0°F, daily sales are predicted to be −$320 (a meaningless value in context – y-intercepts are often outside the practical range of data)
SAT insight: the negative y-intercept is not an error – it simply means 0°F is outside the range of the data. The SAT will ask what the slope "represents," not to evaluate the y-intercept at x = 0.
Answer: 15 = predicted sales increase per 1°F · −320 = predicted sales at 0°F (outside practical range)
Linear Regression Example 3
Example 3 – Prediction & Residual
Hard – SAT / AP Statistics
Using ŷ = −320 + 15x (from Example 2), predict daily sales when the temperature is 85°F. If actual sales on an 85°F day were $975, find the residual. Is the model an overestimate or underestimate for this data point?
Step 1: Substitute x = 85 into the equation → ŷ = −320 + 15(85) = −320 + 1,275 = 955
Step 2: Predicted sales at 85°F → ŷ = $955
Step 3: Calculate residual → Residual = Actual − Predicted = 975 − 955 = +20
Step 4: Interpret residual sign → positive residual (+20) means actual > predicted → the model UNDERESTIMATED sales for this data point
AP Statistics note: a positive residual means the point is ABOVE the regression line. A negative residual means the point is BELOW it. This sign convention is tested on both the AP Statistics exam and SAT.
Answer: Predicted = $955 · Residual = +$20 · Model underestimated (actual was $20 higher than predicted)

Linear Regression on the SAT — Problem Solving & Data Analysis

Scatter plots with lines of best fit appear on every SAT Math section — typically 2–4 questions in Problem Solving & Data Analysis. Florida students who can read a scatter plot visually but have not been taught the SAT’s specific regression question types consistently miss these points despite understanding the underlying concept. Knowing the question formats in advance — not just the formula — is what converts regression knowledge into SAT points.

SAT Question Type Frequency Table
SAT Question Type What It Tests Frequency
Interpret the slope in context "What does the value X represent in this situation?" – slope as rate of change in real-world units 2–3× per test
Make a prediction from the equation Substitute an x-value into ŷ = a + bx to find the predicted y-value 1–2× per test
Identify overestimate or underestimate Compare a specific data point to the regression line – above line = underestimate, below = overestimate 1× per test
Interpret the y-intercept in context What does the y-intercept mean "in this situation"? – often outside practical range 1× per test
Describe the association Identify positive/negative/no association from scatter plot direction or r-value sign 1× per test
SAT Strategy Card
SAT Strategy – Regression Questions
Read the units · Identify slope as rate · Never say "correlation proves causation"

Rule 1: For "what does X represent?" questions – always answer with units: "for every 1 [x-unit] increase, ŷ increases by [b] [y-units]." Never just write the number.

Rule 2: For over/underestimate questions – find the data point on the scatter plot. Above the line → actual > predicted → underestimate. Below → overestimate.

Rule 3: Never conclude causation from a regression equation. If the SAT asks "does this prove that X causes Y?" – the answer is always no. Regression shows association, not causation.

Correlation Coefficient — How Strong Is the Relationship?

The correlation coefficient r measures the strength and direction of the linear relationship between two variables. It always falls between −1 and +1. Knowing r tells you how reliably the regression line predicts y — but r alone does not tell you whether x causes y. This distinction — correlation versus causation — is tested on both the SAT and AP Statistics exam.
Correlation Coefficient Cheat Sheet
Correlation Coefficient (r)
−1 ≤ r ≤ +1 · Sign = direction · |r| = strength
r = +1: perfect positive linear relationship (all points on a line, sloping upward)
r = −1: perfect negative linear relationship (all points on a line, sloping downward)
r = 0: no linear relationship
|r| close to 1: strong linear association · |r| close to 0: weak linear association

r² (coefficient of determination): the proportion of variation in y that is explained by the linear relationship with x. Example: r = 0.8 → r² = 0.64 → the regression model explains 64% of the variation in y.

✓ Strong Association
|r| ≥ 0.8
Points cluster tightly around the regression line. Slope direction (positive or negative) is clear. Predictions from the equation are reliable within the data range.
~ Moderate Association
0.5 ≤ |r| < 0.8
Points show a trend but with more spread. Predictions are directionally accurate but less precise. The regression equation is still useful for general estimation.
× Weak / No Association
|r| <

4 Common Mistakes with Linear Regression

Concluding that correlation proves causation. A regression equation shows that x and y are associated — not that x causes y. “Ice cream sales and drowning rates are positively correlated” does not mean ice cream causes drowning (both increase in summer heat). On the SAT, any answer choice that uses causal language (“proves that X causes Y”) is always wrong for regression questions. Fix: regression establishes association. Never use causal language (“causes,” “leads to,” “proves”) when interpreting a regression equation. Use “is associated with,” “predicts,” or “is related to” instead.
Confusing TI-84 notation with statistical notation. The TI-84 displays “y = ax + b” where a = slope and b = intercept. AP Statistics textbooks and many SAT problems use “ŷ = a + bx” where a = intercept and b = slope. Students copy calculator output and misidentify which value is slope. Fix: always identify slope by its position — the coefficient of x (multiplying x) is the slope, regardless of what letter the calculator uses. When in doubt, check: does the value change when x changes? That value is slope.
Extrapolating far beyond the data range. Using the regression equation to predict y at x-values far outside the observed data range produces unreliable estimates. The linear model only applies within (or close to) the range of x-values in the original data set. Fix: note the range of x-values in the data. If asked to predict at an x-value far outside that range, the answer is always “unreliable” or “should not be trusted” — the SAT uses this as a data literacy question type.
Getting residual sign wrong. Students calculate residual as “predicted minus actual” instead of “actual minus predicted.” A positive residual means the actual value was higher than predicted (the data point is above the line). A negative residual means actual was lower (point is below the line). The wrong subtraction order flips every over/underestimate answer. Fix: Residual = Actual (observed) − Predicted (ŷ). This order never changes. Positive = above line = model underestimated. Negative = below line = model overestimated.

Practice Problems — Linear Regression

Frequently Asked Questions — Linear Regression

Linear regression is a statistical method for finding the straight line — called the line of best fit — that best models the relationship between two quantitative variables in a scatter plot. The regression equation is written as ŷ = a + bx, where b is the slope (rate of change) and a is the y-intercept (predicted value when x = 0). Linear regression is covered in Florida’s MAFS.912.S-ID.6 standards and tested on the SAT Math Problem Solving & Data Analysis section.

The line of best fit is the straight line in a scatter plot that minimizes the total distance (specifically, the sum of squared vertical distances) between the line and all the data points. It is also called the least-squares regression line or the regression line. The line of best fit is used to identify the trend in the data and to make predictions about values of y for given values of x. The equation of the line of best fit is ŷ = a + bx in statistical notation.

The slope of a regression line represents the predicted change in the dependent variable (y) for every 1-unit increase in the independent variable (x). Always interpret slope with units: “for every 1 [x-unit] increase, y is predicted to increase/decrease by [slope value] [y-units].” On the SAT, “what does the value X represent in this situation?” questions always require this full unit-based interpretation — the number alone is never the complete answer.

Linear regression and scatter plot questions appear on every SAT Math section under Problem Solving & Data Analysis — typically 2–4 questions per test. The most common types include interpreting the slope in context (2–3 per test), making a prediction from the regression equation (1–2 per test), identifying whether the regression line overestimates or underestimates a specific data point (1 per test), and describing the association strength and direction from r-value or scatter plot appearance (1 per test). Mastering these question types requires knowing the formats in advance, not just the formula.

Yes. InLighten’s certified math tutors in Orlando specialize in statistics including linear regression — covering the regression equation ŷ = a + bx, slope and y-intercept interpretation in context, scatter plot reading, residuals, and the SAT Problem Solving & Data Analysis question types that most Florida students miss. We also support AP Statistics students with correlation coefficients, r², and regression analysis. We diagnose exactly where your student loses points before building targeted sessions around those gaps. Book a free math assessment to get started.

Missing Points on Regression? Let InLighten's Orlando Math Tutors Find Exactly Where.

Knowing the regression equation is not the same as earning points on it. The SAT’s “what does the slope represent in this context?” questions are missed not because students don’t know the formula — but because they’ve never been taught to attach units and real-world meaning to a number. InLighten’s certified math tutors in Orlando specialize in that gap: translating regression knowledge into the specific answer format the SAT scores. For AP Statistics students, we go further — residuals, r², the least-squares criterion, and the inference procedures that appear on the AP exam. Florida student-athletes tracking Bright Futures scholarship requirements and NCAA academic eligibility have dual academic pressure: SAT Math scores and AP course GPA both matter. Our tutors build a plan that addresses both, session by session, with concrete score improvement targets.