Scatterplots+and+Correlation

= __Scatterplots __ = toc Written by: Ben Weiss, Jonathan Itskovitch, and Ryan Hall

EXPLANATION OF TOPIC
Among the many useful ways of plotting data is the use of scatter plots. Scatter plots, much like line graphs, are collections of data points that are placed upon a horizontal axis and a vertical axis. Unlike other kinds of graphs (where only data from one scenario is given), these plots include data from multiple sources (usually a single point per source) to demonstrate the impact of one variable upon another. Often times, there will be a continuing, often linear trend in the data (i.e. as one variable increases, the other will also increase), but there does not have to be. This relationship between the variables is called the correlation. Before this correlation can be determined (assuming that there is one and that the data is not randomly scattered), all the points must be plotted out. Once a general correlation is determined, it is then possible to draw a line of best fit to represent the overall trend. This line is also known as a regression line. Correlation is used only for linear relations; it is not used for quadratic relations, etc.

Scatter plots can be categorized into five relationships that are defined by two factors: the presence of a slope (whether positive or negative) and how close the data points are to forming a straight line.

Correlations can be defined on whether or not they are “positive” or “negative”. A positive correlation exists if y values increase as x values increase. On the other hand, a negative correlation exists if y values decrease as x values increase. This concept of positive and negative should be considered as describing general trends and not perfectly linear graphs. The closeness of the plot points to forming a straight line is the other factor in realizing the scatter plot correlation. A perfect line going in a positive direction has a coefficient of +1, while a perfect line going in a negative direction has a coefficient of -1. The closer the coefficient to +1 or -1, the closer the actual data is to be perfectly linear. Likewise, a lesser coefficient will mean that the the data is less linear in nature. A coefficient of 0 means that there is no relationship/correlation whatsoever and that the data is randomly scattered. A scatterplot with coefficients closer to positive/negative 1 is called a high correlation, while a scatterplot with a coefficient closer to 0 is called a low correlation. If you want to know more about how to calculate the coefficient (or R^2 value), you can read more about it here. As a result, there are five types of correlations: high positive, low positive, high negative, low negative, or no correlation.

Scatterplots can also be used as a means of determining the mean, median, and mode of data.



**BASIC SAMPLE QUESTION**
1. Highlight below to show answer and explanation: As shown in the scatterplot, as the water temperature rises every 6 degrees, the amount of ice in the river decreases by about 75 tons. Since the scatterplot shows that there is around 600 tons of ice when the water temperature is -14 degrees, if the temperature falls another 6 degrees to negative twenty degrees, there would be about 75 tons of ice, so 674 tons in total.

SAT II Math II Type Questions
Highlight under the problem to see the correct answer:

2. According to the data, which of the following statements is true?  A. The salary increases at an average of $2,000 every year B. The mean salary is $73,600 C. The salary exceeds $75,000 less than half of the time D. The linear trendline is about y=5000x+3000 E. The salary only increases over 10,000 in one year twice.

Highlight below to show answer and explanation:

The correct answer is B. If you take the mean salaries of all 15 points, you should get exactly this number. It may be slightly different if your numbers weren't exactly what we had on the data table. A is wrong because the salary increases more than 2,000 per year. It actually rises 2,666 per year. C is wrong because the salary exceeds 75,000 8 times exactly, which is more than half of the time. D is wrong because the slope is too steep and the y-intercept is much too low. The trendline is more around y=3050x +49200. E is wrong because the increase occurs only once.

3. Last week, Interact held its annual Bake-a-thon, a charity at which students bake cookies for the homeless on their own time. After the event, Mrs. White created a scatter plot with the number of cookies each Interact member made in the time they baked for. Based on the resulting scatter plot, which of the following is NOT true?

A. 3 people baked for 2.5 hours B. For the most part, there is little or no correlation between the amount of time spent baking and the amount of cookies made C. The total amount of cookies made could be around 360 (ha! get it?) D. The people who baked for 2 hours made more cookies combined than those who baked for 1.5 hours combined E. The median amount of cookies made could be 29

Highlight below to show answer and explanation:

**The answer to this question is D. From looking at the graph, you can easily determine that 3 people did bake for 2.5 hours because there are 3 data points at x= 2.5. Furthermore, it is clear that there is little or no correlation between the amount of time spent baking and the amount of cookies made because y-values neither consistently increase/decrease as x-values increase; therefore, B is also correct. After finding a rough value of the total amount of cookies baked, one should find that the value is close to 360 (the actual sum is 355), thus C is correct. Then, one could draw a line at y= 29 and easily see that the amount of data points above the line and the amount of data points below the line are about the same, making E also correct. As a result, the only incorrect answer is D, which can be determined by finding the sum of cookies made for each amount of time.**

WORKS CITED: [|__http://www.stat.yale.edu/Courses/1997-98/101/scatter.htm__] <span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; vertical-align: baseline;">[|__http://mste.illinois.edu/courses/ci330ms/youtsey/scatterinfo.html__] <span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; vertical-align: baseline;">[|__http://books.google.com/books?id=cwBCHTz62qcC&pg=PA84&lpg=PA84&dq=sat+ii+math+2+scatter+plots&source=bl&ots=ci64kauIr5&sig=5lsQF5DbQSkjsHAmdDHnFQXIDCE&hl=en&sa=X&ei=QK2NT_LdOKHV0QGnmuCNDw&ved=0CCwQ6AEwATgK#v=onepage&q&f=false__]