Regression

__Regression:__ a measure of the relation between the mean value of one variable and corresponding values of other variables

When a group of data is collected and plotted, it is often graphed in what is known as a scatter plot. Each specific point represents a relationship between the x and y variables that are explained on the axes. Together, the collection of points can either be random, or form a line or curve. Regression can be used to determine the relationship (or "correlation") that is formed, and allows one to make inferences about the data from observing the scatter plot.

__Linear Regression__: as one variable increases or decreases, the other variable increases or decreases in a linear fashion

 * [[image:line align="right"]]Least Squares Regression Line (LSRL): line of best fit for the data set
 * The line of best fit is the line that minimizes the sum of the distances from the data points to the line
 * LSRL equation is in the format y=ax+b (y=mx+b)
 * Line used to predict values

Residuals:

 * the value of a residual for a specific point on the scatterplot is the distance it is from the LSRL
 * this tells how far off a point is from what the predicted value is, based on the LSRL equation
 * a positive residual means that the value is higher than what is expected, and a negative residual means that the value is what is expected
 * residual=observed-predicted

**How to Find it on a Calculator:**
1. Create a list: To create the list of points for your scatter plot, click the STAT button. Next click the edit button. This should bring up a table where you can fill in the values of L1 and L2. Fill in your X values as L1 and your Y values as L2. 2. Create a scatter plot: To create a scatter plot on your calculator, click the STAT PLOT button. This should pull up a list of options. Scroll to plot one and click it. Turn the plot on. It will then give you options for the type, X list, and Y list. For type, click the first one. For X list it should be set to L1 and for Y list it should be set to L2. Next choose the mark you would like to show up on the plot. 3. Create the LSRL: To create an LSRL and plot it on your scatter plot, click the STAT button again. Scroll to the CALC list and click LinReg(ax+b). //The first thing you put into this is the list of your X values. In this case it is L1. To type in L1, hit the second key and then the one key. Type a comma next, and then you want to fill in your list containing the Y values. In this case it is L2. To type L2, click the second key and then the two key. Add another comma. (this is only if you have more lists than two, otherwise your calculator automatically assumes L1 and L2 and you can just skip to the next step)// Next you want to type Y1 to specify that you want the equation in the Y1 spot on your graphing application. To type in Y1, click VARS. Scroll to the Y-VARS section and then hit the first option, which should say function. Click Y1. Then click enter and this should put a line of best fit (LSRL) on your graph and the equation will be displayed on your screen.

//http://msdn.microsoft.com/en-us/library/ms174824.aspx// (for linear regression picture) //http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=9&ved=0CHkQFjAI&url=http%3A%2F%2Fhomepage.smc.edu%2Fmcgraw_colleen%2Fmath_52%2Fcalcnotes%2FLinear%2520Regression%2FTI%252083%2520Linear%2520Regression.pdf&ei=WZKQT5yCJuaW6AHNtNyUBA&usg=AFQjCNFFs0JoDAq60GUlWfbepP66IFo4bg&sig2=4cUt0tt_3I-6SBmkcqe9YQ// (for first two and the third calculator pictures) //http://pages.central.edu/emp/lintont/ti83/html/linreg/linreg.html// (for second two calculator pictures)

__Quadratic Regression:__ as one variable increases or decreases, the other variable increases or decreases in a parabolic fashion

 * equation for best fit: y=ax 2 +bx+c[[image:Screen_shot_2012-04-19_at_6.20.44_PM.png width="301" height="267" align="right"]]

How to find Quadratic Regression on your calculator:
1. Follow step one from linear regression (making a list). 2. Follow step two from linear regression (making a scatter plot). 3. Finding Quadratic Regression: To find quadratic regression it is very similar to finding linear regression. Firs you click the STAT button. Scroll to the CALC menu and click the fifth option, which should be QuadReg. As before, you can type in L1 and L2 to specify the lists, but this is still only necessary if you have more than two lists stored on your calculator. Next you type in the Y1 variable to specify the equation that should show up in that spot on your scatter plot. To click this, go to VARS. Under Y-VARS, click the option labeled Function. The first option should be Y1. Click enter again and this should bring up the equation on your screen and add the quadratic regression line to your graph.

[[image:Screen_shot_2012-04-19_at_7.11.03_PM.png width="227" height="164"]]
//http://calculator.maconstate.edu/quad_regression/index.html// (for quadratic regression calculator pictures) //http://www.ltcconline.net/greenl/courses/103a/keys/exam2PracticeExam/key.htm// (for first quadratic regression picture)

__Exponential Regression:__ as one variable increases or decreases, the other variable increases or decreases in an exponential fashion

 * equation for best fit: y=a(b x )

1. Follow step one from linear and quadratic regression (making a list). 2. Follow step two from linear regression (making a scatter plot). 3. Finding Exponential Regression: Finding exponential regression is extremely similar to finding quadratic and linear regression. In order to find the exponential regression equation and graph, you must first click the STAT button. Under the CALC drop menu there should be an option for ExpReg. Click this. When this comes up, type in the Y1 variable. To type in the Y1 variable, click on the VARS key and then click function in the drop menu. Y1 should be the first option. Then, click enter again and this will display the equation and put it on your graph. (Once again, it is unnecessary to type in L1 and L2 unless you have more than those two lists stored in your calculator as explained above)
 * How to Find Exponential Regression on Your Graphing Calculator:**

//http://mathbits.com/mathbits/tisection/statistics2/correlation.ht// (for sample exponential regression picture) download.intel.com/education/Common/en/.../track_worksheet.doc (for exponential calculator pictures)

=**-Relation to Correlation-**= Correlation Coefficient (R)
 * after using a calculator to determine the line or curve that best fits the data set, an R value is also given
 * strong R values are those that are close to 1 and -1
 * no correlation with give an R value of close to 0

Coefficient of Determination (R^2) [[image:Screen_shot_2012-04-19_at_6.21.39_PM.png align="right"]]

 * the closer an R^2 value is to 1, the more confident one can feel that they can get a good prediction of one variable based on the other one in a scatterplot

Both the R and the R^2 values are representations of how well the data fits the line or curve or how accurately one can predict a value based on the other variable

//http://hotmath.com/hotmath_help/topics/exponential-regression.html// (for picture)

Influential Data Points

 * A point on the scatter plot that lies apart from the pattern of the line or curve that the rest of the data is following is referred to as an influential data point
 * this type of point will greatly impact the value of the correlation coefficient, coefficient of determination, y-intercept and slope of the best fitting curve or line
 * this is similar to an outlier in a data set, in that it is far off from the rest of the values

below is an example of an influential data point on a scatter plot: (notice how drastically different the LSRL is when the influential data point is ignored, as compared the when it is not)

http://www.ats.ucla.edu/stat/spss/examples/ara/foxch11.html (for picture)

=Sample Questions= 1) For a certain automobile, it costs $15,000 when first purchased. The value of the automobile depreciates at the rate shown in the above table. Based on the least squares regression line, what is the value, to the nearest hundred dollars, of the automobile when t=4.

(a) $5400 (b) $5500 (c) $5600 (d) $6400 (e) $7000

2) Which of the following equations best models the data in the table above? (a) y = -2.35(1.38) x (b) y = -1.38(2.35) x (c) y = 1.38(2.35) x (d) y = 2.35(1.38) x (e) y = 1.38x 2.35

3) C = -1.78F + 98.63 The linear regression model above is based on an analysis of nutritional data from 14 varieties of cereal bars to relate the percent of calories from fat (F) to the percent of calories from carbohydrates (C). Based on this model, which of the following statements must be true?

I. There is a positive correlation between C and F. II. When 15 percent of calories are from fat, the predicted percent of calories form carbohydrates is approximately 72. III. The slope indicates that as F increases by 1, C decreases by 1.02.

(a) II only (b) I and II only (c) I and III only (d) II and III only (e) I, II, and III

http://sat.collegeboard.org/practice/practice-test-section-start?practiceTestSectionIDKey=Subject.MATH_LEVEL_2&pageId=practiceSubjectTestMathLevel2&header=Mathematics%20Level%202&subHeader=SAT%20Subject%20Test%20in%20Mathematics%20Level%202%20Practice&conversationId=ConversationStateUID_1