The problem provides a frequency distribution table representing the number of orders ($n_i$) as a function of the order amount ($x_i$) for the last six months of a business. We are asked to answer several questions, including identifying the nature of the variable, creating a histogram, finding the mode, median, quartiles, deciles, calculating central moments, the medial, the concentration index, and finally, draw the concentration curve.
Probability and StatisticsFrequency DistributionHistogramCumulative FrequencyMeasures of Central TendencyMeasures of DispersionMomentsSkewnessKurtosisGini CoefficientConcentration Curve
2025/3/18
1. Problem Description
The problem provides a frequency distribution table representing the number of orders () as a function of the order amount () for the last six months of a business. We are asked to answer several questions, including identifying the nature of the variable, creating a histogram, finding the mode, median, quartiles, deciles, calculating central moments, the medial, the concentration index, and finally, draw the concentration curve.
2. Solution Steps
1. What is the observed character? What is its nature?
The observed character is the amount of the orders . It is a quantitative continuous variable because it can take on any value within the given intervals.
2. Construct the histogram and the frequency polygon.
We have the following classes and frequencies:
- [1000, 1500[: 4
- [1500, 2000[: 20
- [2000, 2500[: 24
- [2500, 3000[: 28
- [3000, 3500[: 22
- [3500, 4000[: 2
The histogram is constructed by drawing rectangles for each class. The base of each rectangle is the class width (500), and the height is the frequency.
The frequency polygon is constructed by connecting the midpoints of the top of each rectangle in the histogram.
Midpoints: 1250, 1750, 2250, 2750, 3250,
3
7
5
0.
3. Construct the cumulative frequency curve (ogive).
Cumulative frequencies:
- [1000, 1500[: 4
- [1000, 2000[: 4 + 20 = 24
- [1000, 2500[: 24 + 24 = 48
- [1000, 3000[: 48 + 28 = 76
- [1000, 3500[: 76 + 22 = 98
- [1000, 4000[: 98 + 2 = 100
The cumulative frequency curve is obtained by plotting the upper limit of each class against the corresponding cumulative frequency and connecting the points with line segments.
Points: (1500, 4), (2000, 24), (2500, 48), (3000, 76), (3500, 98), (4000, 100).
4. Determine the modal class, the median, the quartiles $Q_1$ and $Q_3$, the deciles $D_1$ and $D_9$.
Total number of orders .
*Modal class*: The class with the highest frequency is [2500, 3000[, with a frequency of
2
8.
*Median*: The median is the value such that 50% of the data is below it. We want to find the class that contains the 50th percentile (50% of 100 = 50). The cumulative frequency reaches 48 in [2000, 2500[ and 76 in [2500, 3000[. So, the median lies in the interval [2500, 3000[.
The median is , where is the lower limit of the median class (2500), is the total frequency (100), is the cumulative frequency of the class before the median class (48), is the frequency of the median class (28), and is the class width (500).
Median = .
*Q1*: is the value such that 25% of the data is below it. .
lies in [1500, 2000[.
.
*Q3*: is the value such that 75% of the data is below it. .
lies in [2500, 3000[.
.
*D1*: is the value such that 10% of the data is below it. .
lies in [1500, 2000[.
.
*D9*: is the value such that 90% of the data is below it. .
lies in [3000, 3500[.
.
5. Calculate the central moments of order 2, 3, and
4. Calculate Fisher's skewness coefficient and Pearson's kurtosis coefficient.
Let's denote the class midpoints by : 1250, 1750, 2250, 2750, 3250,
3
7
5
0. Calculate the mean: $\bar{x} = \frac{\sum n_i m_i}{\sum n_i} = \frac{4(1250) + 20(1750) + 24(2250) + 28(2750) + 22(3250) + 2(3750)}{100} = \frac{5000 + 35000 + 54000 + 77000 + 71500 + 7500}{100} = \frac{250000}{100} = 2500$.
Central moment of order 2: .
Central moment of order 3: . This will result in .
Central moment of order 4: . This is a large number and calculation is omitted for brevity.
Fisher's skewness coefficient: where . Since , .
Pearson's kurtosis coefficient: . We would plug in the values for and (which is 362500) to calculate this, which would give a measure of how heavy tailed this distribution is. If , the distribution is leptokurtic (heavy tails). If , the distribution is platykurtic (light tails). If , it is mesokurtic.
6. Calculate the médiale of this distribution. Inferred the concentration range.
Médiale, is the amount which divides the total amount of orders in half.
Total amount of orders: . Where here refers to the midpoint.
We have it calculated above as .
We want to find the class such that the value is closest to half of the total amount of orders.
Class [1000, 1500[:
Class [1500, 2000[: .
Class [2000, 2500[:
Class [2500, 3000[:
Class [3000, 3500[:
Class [3500, 4000[:
So, the total order value up to the class limit of the first class is
5
0
0
0. Up to the class limit of the second class is $5000 + 35000 = 40000$. Up to the class limit of the third class is $40000 + 54000 = 94000$. Up to the class limit of the fourth class is $94000 + 77000 = 171000$. Since 125,000 is somewhere within 94,000 and 171,000, so we have that Médiale lies within the class [2500, 3000[.
7. Calculate its concentration index.
We are asked to calculate the concentration index. Here we can assume they mean the Gini coefficient. Gini coefficient is a measure of statistical dispersion.
, where is the cumulative proportion of the population and is the cumulative proportion of the variable of interest (here, order amount).
Cumulative frequencies are: 4, 24, 48, 76, 98,
1
0
0. Cumulative relative frequencies are: 0.04, 0.24, 0.48, 0.76, 0.98,
1. Cumulative total order amount: 5000, 40000, 94000, 171000, 242500,
2
5
0
0
0
0. Cumulative relative order amount: 0.02, 0.16, 0.376, 0.684, 0.97,
1.
.
8. Draw and interpret the concentration curve.
The concentration curve (Lorenz curve) is obtained by plotting the cumulative relative frequency of the population (x-axis) against the cumulative relative frequency of the variable of interest (y-axis). The line of equality is a diagonal line from (0,0) to (1,1), which represents perfect equality.
In this case, we plot the points: (0, 0), (0.04, 0.02), (0.24, 0.16), (0.48, 0.376), (0.76, 0.684), (0.98, 0.97), (1, 1). The Gini coefficient is twice the area between the Lorenz curve and the line of equality.
Interpretation: Since the Gini coefficient is 0.13448, there is a small amount of concentration.
3. Final Answer
1. The observed character is the order amount ($x_i$). It is a quantitative continuous variable.
2. The histogram and frequency polygon are described in the solution steps.
3. The cumulative frequency curve is described in the solution steps.
4. Modal class: [2500, 3000[. Median: 2535.
7
1. $Q_1$:
2
0
2
5. $Q_3$: 2982.
1
4. $D_1$:
1
6
5
0. $D_9$: 3318.
1
8.
5. $\mu_2 = 362500$. $\mu_3 = 0$. The calculation for $\mu_4$ is omitted. Fisher's skewness coefficient $\gamma_1 = 0$. Pearson's kurtosis coefficient requires calculating $mu_4$.
6. The medial is in the [2500,3000[ class.
7. Gini coefficient (concentration index) = 0.
1
3
4
4
8.