3 Simple Steps to Perform Linear Regression in Excel
Introduction
In this blog post, I'll show you 3 simple steps to create a linear regression model in Excel. Linear regression is important because it can tell you important trends between two or more variables in the data. So, let's get started!
The Dataset
The Kaggle dataset I'll use in this post shows all the sale prices for the houses in the King County area (or the Seattle area). Here's the download link, if you want to follow along: https://www.kaggle.com/datasets/harlfoxem/housesalesprediction (Credit to Thom Milkovic on Unsplash)
Step 1: Come Up with Your Statistical Question
Before we create the model, we need to know what variables we're interested in. For this post, my question is going to be "What is the relationship between the square feet in the living room and the price?"
So, my variables are going to be the square feet and the price. You can choose your own variables, but you need to make sure that they're continuous variables, or else you're gonna run into a problem.
Now, we need to know what our independent, which is on the x-axis, and our dependent, which is on the y-axis, variables are. In this example, it makes sense to have the house price as the dependent variable because we're assuming that the number of square feet in the house is going to affect what the price is.
In the next step, we're going to hop over onto Excel.
Step 2: Selecting the 2 Columns of Interest in Excel
First, you have to select the second cell in the column that has all the square feets and then do Ctrl+Shift+Down:
Then, hold the Ctrl key and click on the second cell in the price column and do Ctrl+Shift+Down again:
Step 3: Create the Scatterplot
Now, go to "Insert" and click on this button over here:
But, there's actually a problem:
You can see that all the prices are on the x-axis, which is NOT what we want.
So to fix this, you have to right-click on the graph and click "Select Data":
This is going to open a new window that looks like this:
From here, click on "Edit". This is going to open another window:
In the "Series X values" box, get rid of everything inside and starting from the 2nd cell, highlight all the values in the square feet column. In the "Series Y values" box, do the same thing but this time, highlight all the values in the price column.
So, the boxes should look like this:
Now, the scatterplot looks a lot better now, with the square feet on the x-axis and the prices on the y-axis:
Want to See More?
If you want to see me create a regression line over this scatterplot and analyze what the model tells us, check out my full video!
(Insert the link)
I hope to see you over there!