top of page
Semester Project Information

The project is due Thursday, May 2, 2024 (the last Thursday before finals week).  You will be given a data set based on the weather in 32 cities from the western United States.  The data set will have three quantitative variables and one qualitative variable.  You may work the whole project using computer software, using your calculator or using a combination of the two.  All graphs must be created using computer software.  You will be graded on how well you complete the following tasks.

1. Self-identification (2 pts): State your name, your class section or class meeting time, and the date including the year.

2. Data Identification (2 pts): State your data source.  The source is given next to the data set.

3. Statistics, Confidence Intervals, and Qualitative Frequency Distribution

a. (12 pts) Fill in a table like the following using your quantitative variables.  Replace "Var 1", "Var 2" and "Var 3" with the names or abbreviated names for your variables.

Statistic
Variable 1
Variable 2
Variable 2
Sample Size
Kurtosis
Skewness
Standard Error of the Mean
Standard Deviation
Interquartile Range
Range
Midrange
Mode
Mean
Maximum
Third Quartile
Median
First Quartile
Minimum

b. (12 pts) Fill in a table like the following with the requested 95% confidence intervals for your quantitative variables.  Replace "Var 1", "Var 2" and "Var 3" with the names or abbreviated names for your variables.

Confidence Interval
Variable 1
Variable 2
Variable 3
Population Standard Deviation
Population Mean

c. (6 pts) Using the qualitative variable (the fourth variable), create a qualitative frequency distribution showing both frequencies and relative frequencies.

4. Graphs and Their Analysis

a. (6 pts) Construct a qualitative bar chart from the frequency distribution from 3c. 

b. (2 pts) Answer the following questions about the qualitative bar chart in paragraph form.

  • How many bars are there?

  • Which bar has the highest frequency?

  • Which bar has the lowest frequency?

c. (18 pts) For each quantitative variable, create a histogram with four to six classes.

d. (6 pts) For each histogram, answer the following questions in paragraph form.

  • What are the largest and smallest data values?

  • Is it symmetric, skewed to the left, or skewed to the right?

  • How many peaks does it have, and where are they located?

  • Does it have any gaps and, if so, where are they?

  • Does it have any extreme values and, if so, what are they?

e. (8 pts) For the two comparable quantitative variables (the first 2 variables), create side-by-side horizontal box plots. 

f. (4 pts) Answer the following questions about the side-by-side box plots.

  • Which plot has the largest value?

  • Which plot has the smallest value?

  • Which plot has the largest median?

  • Which plot is more skewed?

  • Which plot has the largest range?

  • Which plot has the largest interquartile range?

g. (4 pts) Create a single horizontal box plot of the third quantitative variable. 

h. (2 pts) Answer the following questions about the single box plot.

  • Are there any outliers?

  • What is the median value?

  • Is the box plot skewed left, skewed right or symmetric?

i. (12 pts) Using the first and third variable, create a scatter diagram with the regression line.

j. (8 pts) Referring to the scatter diagram you just made, do the following.

  • Decide whether the association between the two variables is strong, moderate or weak, and also whether the association is positive, negative or neither.

  • Identify any outliers or influential observations on the scatter diagram.

  • Find and state the equation of the regression line in the scatter diagram.

  • Find the coefficient of determination and then interpret the coefficient of determination in the context of the scatter diagram you just created.

5. Hypothesis Tests: For each of the following hypothesis tests, i) name the test you ran, ii) state the test statistic, iii) state the P-value, and iv) state your conclusion to the test in English

a. (10 pts) Run a chi-square goodness of fit test at 95% confidence to see whether or not the outcomes of your qualitative variable are equally likely. 

b. (10 pts) Run a dependent difference test to see if the means of variables 1 and 2 are equal at 95% confidence.

c. (10 pts) Test to see if there is significant linear correlation between variables 1 and 3 at 95% confidence.

d. (12 pts) Run the Shapiro-Wilk test at 95% confidence for each of the quantitative variables (variables 1 through 3) to see if these variables are normally distributed.  When finished with this step, you will have run three different tests.

6. Table of Data: (4 points)  Insert a table showing all your data at the end of the project.

bottom of page