Mar. 29: Using R, Descriptive Statistics

Introduction Assignment

Due Date: April 5th, 9:30am

Read: Univariate Statistics 1

Read: Univariate Statistics 2 (only read to pg. 11)

Read: Univariate Statistics 3

Guided Summary:

- What is a Histogram? What kind of Histogram is the most common?

A histogram is a graphical representation that organizes a group of data points into user-specified ranges. Similar in appearance to a bar graph, the histogram condenses a data series into an easily interpreted visual by taking many data points and grouping them into logical ranges or bins. - Add a photo of a histogram to your site
- Why is the bin size of a histogram important? The most important parameter of a histogram is the bin width because it controls the tradeoff between presenting a picture with too much detail (“undersmoothing”) or too little detail (“oversmoothing”) with respect to the true distribution.
- What are the five numbers in a 5-number summary?

The five number summary includes 5 items:

a) The minimum.

b) Q1 (the first quartile, or the 25% mark).

c) The median.

d) Q3 (the third quartile, or the 75% mark).

e) The maximum. - How do you determine the first and third quartiles?

Q1 is the median (the middle) of the lower half of the data, and Q3 is the median (the middle) of the upper half of the data. (3, 5, 7, 8, 9), | (11, 15, 16, 20, 21). Q1 = 7 and Q3 = 16. - What do the length of the “whiskers” in a box plot mean? What do you call the numbers beyond the “whiskers”?

The length of the upper whisker is the largest value that is no greater than the third quartile plus 1.5 times the interquartile range.

• They are also known as guides - How do you calculate the length of the “whiskers”? The third quartile plus 1.5 times IQR is 10 + 1.5*6 = 19. The largest value that is no greater than 19 is 13, so the upper whisker will reach to 13.

(optional) complete the end of chapter exercises - What is a measure of central tendency?(use your own words!)

There are three main measures of central tendency: the mode, the median and the mean. Each of these measures describes a different indication of the typical or central value in the distribution. - What is the mean? (use your own words!)

The mean is the arithmetic average of a set of given numbers. The median is the middle score in a set of given numbers. The mode is the most frequently occurring score in a set of given numbers.24 Mar 2020. - What is the median? (use your own words!)The Median is the “middle” of a sorted list of numbers. middle number in a list of numbers.
- Explain why the mean is larger than the median in both lists of table 2.1

Of the three statistics, the mean is the largest, while the mode is the smallest. Again, the mean reflects the skewing the most. To summarize, generally if the distribution of data is skewed to the left, the mean is less than the median, which is often less than the mode. - What is variance?

The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean. Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance is in relation to the mean. - Describe how you calculate it in your own words

Variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. - What is the standard deviation? Describe how you calculate it in your own words

A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean. Low standard deviation means data are clustered around the mean, and high standard deviation indicates data are more spread out.The standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

(optional) complete the end of chapter exercises - What number should all proportions add up to?

They add up to 1. - What are the three levels of measurement?

There are three scales of measurement used in statistical analysis: Categorical, ordinal, and continuous. - Which level of measurement uses a bar chart?

The ordinal. - The text talks about many tips and rules for graphs. What are three things that make a good, clear bar chart?

Easy comparisons between different variables.

Clarity in displaying trends in data.

Easy determination in the value of a variable