- Professional Essay Writing Service
- +1 810 395 5448
- support@criticalassignment.com

DATA AND VARIABLES

Data Set: Collection of data values or data points

N = number of data points or values in the set

Frequency: the number of times a specific data point/ value is repeated

Relative Frequency: gives frequency for each data value as a percentage of the total data set

Variable: any characteristic that varies with the members of a population

BASIC CLASSIFICATIONS OF VARIABLES

Numerical (Quantitative, Nominal) Variable: variable that is a measureable quantity

o Continuous: difference between values of a numerical variable are arbitrarily small, often expressed in a range (ex: Height, Foot Size, Mile Run Time)

o Discrete: values of numerical variable change by minimum increments (ex: Shoe Size, IQ, SAT Score, Points scored in a basketball game)

Categorical (Qualitative) Variables: variable that cannot be measured numerically (ex: Race, Nationality, Gender, Hair Color)

QUESTION SET 1

Identify the following variables as categorical, discrete or continuous.

1.1 Occupation

1.2 Weight

1.4 Family Size

1.5 Education level

1.3 Region of residence 1.6 Number of automobiles owned

REPRESENTING DATA SETS

List: All N data points or values are listed (can be ascending, descending, or random orders)

Frequency table: data values paired with the number of times that value is repeated.

Do not list data values of frequency zero

Basic Graphs and Charts

Bar Graph: Plots the data values, in increasing order, and frequency for each data point.

Axes = Data Values and Frequencies (Usually Frequencies are vertical)

Bars DO NOT TOUCH

More visual representation of a frequency table and shows 0 frequencies

Histogram

A type of bar graph for continuous numerical variables

Bars will now touch each other or can use relative frequencies

Dimensions of the bars may include different data values and combined frequencies.

Pie Chart: Uses relative frequencies (percentages) for the sectors of each data group

QUESTION SET 2

2.1 The table below contains the scores on a Chemistry 103 final exam consisting of ten questions worth ten points each. Complete the frequency table for this exam.

Student ID Score Student ID Score

1362 90 4315 10

1486 90 4719 0

1721 70 4951 70

1932 60 5321 40

2489 60 5872 50

2766 80 6533 70

2877 80 6921 70

2964 60 8317 90

3217 80 8854 80

3588 100 8964 70

3780 90 9158 90

3921 90 9347 80

2.2 Suppose the grading scale for the Chemistry exam (above) is A: 80-100, B: 70-79, C: 60-69, D: 50-59 and F: 0-49. Find the grade distribution for the exam.

Frequency Table of Chemistry Grade Distribution

Grade A B C D F

Frequency

2.3 The table to the right shows the grade distribution for a recent civics test. Find the relative frequency for each grade from the civics test.

Frequency Table for Civics Grade Distribution

Grade A B C D F

Frequency 3 7 11 2 1

2.4 Use the appropriate inferential test to determine if there is a linear relationship between chemistry and civics grades (in 2.2 and 2.3 above). If yes, what type of correlation is it, and is the correlation statistically significant?

2.5 The bar graph describes the scores of a group of students on a 10-point math quiz.

a. How many students took the math quiz?

b. What percentage of the students scored 2 points?

c. If a grade of 6 or more was needed to pass the quiz, what percentage of the students passed?

2.6 The pie chart to the right shows the possible causes of death among 18-22 year olds.

a. Is cause of death a quantitative or qualitative variable?

b. Based on the data provided in the pie chart, estimate the number of 18- to 22-year-olds in the population studied who died as the result of an accident (round to the nearest whole number).

c. Conduct an appropriate inferential test to test the null hypothesis that “18- to 22- year-olds are equally likely to die from accidents, homicides, suicides, cancer, heart disease, and other”

2.7 The pie chart below represents the breakdown of a federal government’s $2.9 trillion budget in the last fiscal year. Calculate the size of the central angle in degrees for each wedge of the pie chart (round to the nearest tenth).

2.8 The following is the frequency table for the musical aptitude scores for 1st grade students.

Aptitude Score 0 1 2 3 4 5

Frequency 24 16 20 12 5 3

0 = no musical aptitude

5 = extremely talented

a. What are the data values in this problem?

b. How many students took the aptitude test?

c. What percent of the students tested showed no musical aptitude?

d. What percent of students showed approximately average musical aptitude

(Scored a 2 or 3)?

e. After participating in a class in music fundamentals, the students were given the aptitude test again with the following results:

Test Score 0 1 2 3 4 5

Frequency 12 19 20 17 8 4

Conduct an appropriate inferential test to accept or reject the null hypothesis “the class did not significantly affect the scores of the students”

NUMERICAL SUMMARIES OF DATA

Xi – The upper case X with the subscript i represents the ith data point in a population data set.

xi – The lower case x with the subscript i represents the ith data point in a sample data set.

i – The subscript letter i is used to locate (or “indicate”) its position in a set of data that is sorted from

the least value to the greatest value.

NUMBER OF DATA POINTS

N – The upper case N is used to represent the number of data points in a population data set

n – The lower case n is used to represent the number of data points in a sample data set.

MEAN: The mean (or average) of a data set is found by dividing the sum of all values in the data set by the number of values in the data set (Data does not have to be sorted to find the mean)

μ – The lower case Greek letter μ is used to represent the mean (or average) of a population data set.

– The lower case x with a bar over the top (read “x-bar”) is used to represent the mean of a sample data set.

MEDIAN: If we sort the data in order from least to greatest, the median is the data point that is found in the exact middle of the sorted data (Data MUST be sorted to find the median)

M – The upper case M is used to represent the median of any data set.

Finding the median: Count inward from the min and max until you end up in the middle or divide

If the number of data points n is ODD, then it is an actual data value. ( Xn/2↑)

If the number of data points n is EVEN, then it is the average of the two middle data values( [Xn/2 +Xn/2 + 1]/2)

MODE: The mode of a data set is the value that has the highest frequency of occurrence (repeated).

There can be multiple modes in a data set if two (or more) data points have the highest frequency.

If a data set has no repeated values, then there is no mode for that data set.

QUESTION SET 3

3.1 Consider the sample data set { –7.8, –4.5, –14.8, 5.8, 5.8, 0.2, –14.8, –6.6}

a. What is the size of the data set?

b. Sort the data set from least to greatest

c. Find the first data point x1 =

d. Find the fifth data point x5 =

e. Find the mean

f. Find the median M =

g. Find the mode mode =

3.2 Find the mean, median and mode of each sample data set.

a. { 3, 4, 5, 6, 7, 8, 9, 10}

__________ M = __________ mode = __________

b. { 3, 5, 8, 11, 14, 15, 16, 17, 18}

__________ M = __________ mode = __________

3.3 The frequency table to the shows the scores of quiz consisting of three questions worth 10 points each.

a. What is the size of the data set? n = __________

b. Find the mean, median and mode of the data set.

___________ M = ___________ mode = ___________

PERCENTILE: the pth percentile of a data set is a data value such that p% of the data is at or below that value and the rest of data is at or above it.

FINDING PERCENTILE: There are three steps to finding the pth percentile.

Step 1. Sort the data xi in order from the least value to the greatest value.

Step 2. Find the locator i for the pth percentile. (Location based on total number of values)

Step 3. Find the pth percentile. The percentile depends on whether or not the locator i is a whole number.

• If i is a whole number, then the pth percentile is the average of the ith data value, Xi, and the data value after it (i+1st data value), Xi+1:

• If i is NOT a whole number, we round up i to the next whole number, i+ and the pth percentile is Xi+. Percentile = Next Available data value after i

Example: Find the 20th percentile and 90th percentile for

{89, 79, 43, 96, 72, 88, 95, 54, 77, 99, 56, 98, 61, 62, 66, 85, 68, 69, 93, 78, 99, 69, 70, 87, 71 }

Step 1:

{43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99, 99 }

Step 2:

20th Percentile: Multiply 20% times the total number of scores,

0.20 x 25 = 5 (the index)

90th Percentile: Multiply 90% times the total number of scores,

0.90 x 25 = 22.5 (the index)

Step 3:

If i is a whole number (5), average the 5th and 6th values in the ordered data set.

Counting from left to right (from the smallest to the largest value in the data set), find the 5th value in the data set (62), and the 6th value in the data set (66).

Average the two values (62 + 66) ÷ 2 = 64.

The 20th percentile is 64

If i is not a whole number (22.5), Round up to the nearest whole number, 23.

Counting from left to right (from the smallest to the largest value in the data set), find the 23rd value in the data set.

That value is 98, the 90th percentile for this data set

QUESTION SET 4

Below is a chart of sorted GPAs

3.33 3.35 3.41 3.42 3.45 3.57 3.62 3.65 3.67 3.71 3.76 3.82 3.88 3.91 4.0

4.1 Consider the sorted GPAs

a. Find the 80th percentile

b. Find the 55th percentile

4.2 Athletes with GPAs in the 80th percentile or above will earn a $5000 scholarship. Which GPAs earned a $5000 scholarship?

4.3 Athletes with GPAs from the 55th to the 80th percentile will get a $2000 scholarship. Which GPAs earned a $2000 scholarship?

QUARTILE: units of 25% of the data values

1st Quartile = 25th Percentile or Q1 “Halfway between Median and Minimum”

2nd Quartile = 50th Percentile or MEDIAN, M

3rd Quartile = 75th Percentile or Q3 “Halfway between Median and Maximum”

FIVE-NUMBER SUMMARY

1) MIN: Minimum Value (0 Percentile)

2) Q1: 1st Quartile (25th Percentile)

3) M or Q2: Median (50th Percentile)

4) Q3: 3rd Quartile (75th Percentile)

5) MAX or Q4: Maximum Value (100th Percentile)

VISUALIZING FIVE-NUMBER SUMMARY

BOX PLOT (Box and Whisker Plot)

Boxes = Range: Q1 to Median and Median to Q3

Whiskers = Min to Q1 and Q3 to Max

QUESTION SET 5

5.1 Find the five-number summary, mean, and mode of the data set:

{65, 68, 70, 71, 73, 73, 74, 76, 78, 81, 81, 85, 86, 87, 89, 90, 91, 91, 93, 95}

Min =

Q1 =

M =

Q3 =

Max =

MEAN =

Mode =

5.2

Frequency Table for Chemistry 103 Exam

Score 10 50 60 70 80 100

Frequency 1 3 7 7 4 2

Min =

Q1 =

M =

Q3 =

Max =

MEAN =

Mode =

MEASURES OF SPREAD

RANGE, R: R = Max – Min

Represents the spread of ALL data values

INTERQUARTILE RANGE, IQR: IQR = Q3 – Q1

Represents the spread of the MIDDLE 50% of the data values

OUTLIERS: an extreme data point that does not fit into the overall pattern

CALCULATING AN OUTLIER: Use the IQR

Value > Q3 + 1.5 IQR or Value < Q1 - 1.5 IQR
Example:
Data set: {71, 70, 73, 70, 70, 69, 70, 72, 71, 300, 71, 69}
Sort values least to greatest: {69, 69, 70, 70, 70, 70, 71, 71, 71, 72, 73, 300}
Calculate median: 70.5
Calculate lower quartile: 6 points lie above the median and 6 points lie below it. To find the lower quartile, average the two middle points of the bottom six points. Points 3 and 4 of the bottom 6 are both 70. Their average is ((70 + 70) / 2), = 70. 70 is the value for Q1
Calculate the upper quartile: the two middle points of the 6 points above the median are 71 and 72. Averaging these 2 points ((71 + 72) / 2), = 71.5. 71.5 will be our value for Q3
Find the interquartile range: subtract Q3 - Q1: 71.5 - 70 = 1.5
Find the inner fences for the data set: the interquartile range is (71.5 - 70), or 1.5. Multiply this by 1.5 = 2.25. Add this number to Q3 and subtract it from Q1 to find the boundaries of the inner fences as follows:
71.5 + 2.25 = 73.75
70 - 2.25 = 67.75
The boundaries of our inner fence are 67.75 and 73.75
Find the outer fences for the data set: Multiply the interquartile range by 3. Add the result to Q3
and subtract from Q1 to find the boundaries of the outer fence.
71.5 + 4.5 = 76
70 - 4.5 = 65.5
The boundaries of our outer fence are 65.5 and 76
Any data points outside the outer fences are considered major outliers
QUESTION SET 6
6.1 Calculate the five number summary for the data set:
{-7, -5, -4, -2, 0, 1, 3, 4, 5, 6, 7, 8, 8, 9}
Identify the following
6.2 Mean:
6.3 Mode:
6.4 Range:
6.5 IQR:
6.6 Upper Outlier Values:
6.7 Lower Outlier Values:
STANDARD DEVIATION: The most important and most commonly used measure of spread. In simple terms, the standard deviation of a data set is the “average deviation from the mean.”
σx – The lower case Greek letter σ is used to represent the standard deviation of a population data set.
Sx – The lower case s is used to represent the standard deviation of a sample data set.
Calculating standard deviation is a multi-step process. Thought seeing the formulas would be helpful.
There is a difference between the calculations for population and sample standard deviations.
Step 1: Find the MEAN of the data set.
Step 2: Find the DEVIATION (difference) from the mean of each value in the data set.
Deviation = (Data Value – Mean)
Step 3: Find the VARIANCE of the data set. Square the deviations and add them together, then divide that total by N = size of population or n – 1 = one less than sample size.
**Population variance**: sample variance:
**VARIANCE can be found by SQUARING the STANDARD DEVIATION.**
Step 4: Find the STANDARD DEVIATION the data set.
**SD of population**: SD of sample:
Note: difference in divisors between the population and sample standard deviations.
QUESTION SET 7
7.1 For the data set: {12, 18, 19, 23, 27, 31, 36}
a. Find the mean
b. Find the standard deviation
7.2 For the data set:
{82, 82, 91, 91, 70, 88, 53, 88, 82, 70, 52, 93, 52, 93, 67, 91, 64, 90, 93, 70, 91, 75}
a. Find the mean
b. Find the range
c. Find the standard deviation
7.3 For the data set: {30, 36, 40, 49, 53, 67, 71, 73, 75, 93}
a. Find the range
b. Find the Interquartile range
c. Are any values outliers?
7.4 A farmer is testing the effects of four different fertilizers on the yields of a certain variety of tomato plants. The four fertilizers are applied to each of five different tomato plants, and the numbers of tomatoes produced by each plant are recorded.
Fertilizer A Fertilizer B Fertilizer C Fertilizer D
33 26 31 29
29 22 36 34
37 16 42 30
39 17 34 31
35 20 30 34
a. Conduct an ANOVA to test the null hypothesis that the means of each fertilizer are all equal.
b. If the means of the three fertilizers are not all equal, conduct a t-test to test each pair of means (note: you will need to know if the variances of the samples are equal)

error: Content is protected !!

Open chat

You can contact our live agent via WhatsApp! Via our number +1 (323) 333-4455.

Feel Free To Ask Questions, Clarifications, or Discounts, Available When Placing the Order.