Reputation: 243
I have different quarters like
Quarter GrossMargin
2009 Q1 17.60%
2009 Q1 17.80%
2010 Q2 18.50%
2011 Q1 21.60%
See like this i have big data set. I want to plot this in R. I used qplot(df$Quarter, df$GrossMargin, binwidth=.5)
It came dots at the data points of each occurance. Like in x-axis 2009 Q1 there are correspondingly two data points in Y-axis one at 17.6 and other at 17.8. But i want an average value for each quarter, like in x-axis 2009 Q1 should correspond to the single value in Y-axis (that is average of 17.6 and 17.8). If i try to do mean(GrossMargin), it gives mean of whole column, which is of no use. All my quarters(2009-Q1,Q2,Q3, 2010-Q1,Q2,Q3) are in one column and corresponding values of GrossMargin in other single column. How to do this in R
Order.Id ProductID UnitPrice UnitCost Quantity Order.Date TotalUnitPrice Quarter GrossMargin Customer.Id TotalUnitCost
1 24849 BDM10023 28.87 23.8 1 01-01-2009 28.87 2009 Q1 17.60% 10025 23.8
2 24849 1283484PMR29 77.84 64 2 01-01-2009 155.68 2009 Q1 17.80% 10025 128
3 24850 1283484SST30 25.78 20 4 01-02-2009 103.12 2009 Q1 22.40% 10096 80
4 24852 GFO20015 8.2 6.75 1 01-02-2009 8.2 2009 Q1 17.70% 10062 6.75
5 24852 OCM10018 8.24 6.8 2 01-02-2009 16.48 2009 Q1 17.50% 10062 13.6
6 24852 BDM10021 8.24 6.8 4 01-02-2009 32.96 2009 Q1 17.50% 10062 27.2
7 24852 POW20011 11.33 9.25 4 01-02-2009 45.32 2009 Q1 18.40% 10062 37
8 24852 AP6011 9.22 7.5 2 01-02-2009 18.44 2009 Q1 18.70% 10062 15
9 24852 POW30012 8.24 6.5 2 01-02-2009 16.48 2009 Q1 21.10% 10062 13
10 24852 PPF20017 11.86 10.25 1 01-02-2009 11.86 2009 Q1 13.60% 10062 10.25
11 24853 AP3008 8.2 6.75 1 01-02-2009 8.2 2009 Q1 17.70% 10030 6.75
12 24855 VEV10023 8.2 6.75 1 01-03-2009 8.2 2009 Q1 17.70% 10037 6.75
13 24855 AP6006 7.73 6.3 2 01-03-2009 15.46 2009 Q1 18.50% 10037 12.6
14 24855 AP5010 8.2 6.75 2 01-03-2009 16.4 2009 Q1 17.70% 10037 13.5
15 24856 1283484PMS30 7.21 5.9 2 01-03-2009 14.42 2009 Q1 18.20% 10078 11.8
16 24857 AP4009 7.16 5 2 01-03-2009 14.32 2009 Q1 30.20% 10032 10
17 24857 GFO10014 7.16 5.9 2 01-03-2009 14.32 2009 Q1 17.60% 10032 11.8
18 24858 AP3003 6.17 5 1 01-04-2009 6.17 2009 Q1 19.00% 10243 5
19 24858 OWW3009 10.25 8.75 1 01-04-2009 10.25 2009 Q1 14.60% 10243 8.75
20 24858 BDM10022 6.18 5.1 1 01-04-2009 6.18 2009 Q1 17.50% 10243 5.1
21 24858 AP2008 6.13 5 1 01-04-2009 6.13 2009 Q1 18.40% 10243 5
22 24858 AP5005 6.7 5.25 1 01-04-2009 6.7 2009 Q1 21.60% 10243 5.25
23 24859 POW30012 8.24 6.5 2 01-04-2009 16.48 2009 Q1 21.10% 10052 13
24 24860 POW20011 11.33 9.25 4 01-04-2009 45.32 2009 Q1 18.40% 10019 37
25 24861 POW10010 18.14 15 2 01-04-2009 36.28 2009 Q1 17.30% 13710 30
26 24861 OWW3009 10.25 8.75 1 01-04-2009 10.25 2009 Q1 14.60% 13710 8.75
27 24862 1283484CPN28 13.35 11 4 01-04-2009 53.4 2009 Q1 17.60% 15310 44
Paste this into excel. I have created this excel file using write command in R. There are about more than 10000 records in original file
> str(df)
'data.frame': 29487 obs. of 11 variables:
$ Order.Id : num 24849 24849 24850 24852 24852 ...
$ ProductID : Factor w/ 42 levels "1202020SFB25",..: 24 4 7 29 31 22 36 19 37 39 ...
$ UnitPrice : num 28.87 77.84 25.78 8.2 8.24 ...
$ UnitCost : num 23.8 64 20 6.75 6.8 ...
$ Quantity : num 1 2 4 1 2 4 4 2 2 1 ...
$ Order.Date : Factor w/ 1261 levels "1/1/2009","1/1/2010",..: 1 1 45 45 45 45 45 45 45 45 ...
$ TotalUnitPrice: num 28.9 155.7 103.1 8.2 16.5 ...
$ Quarter : chr "2009 Q1" "2009 Q1" "2009 Q1" "2009 Q1" ...
$ TotalUnitCost : num 23.8 128 80 6.75 13.6 ...
$ GrossMargin : chr "17.6%" "17.8%" "22.4%" "17.7%" ...
$ Customer.Id : num 10025 10025 10096 10062 10062 ...
> dput(df)
15097, 15097, 12466, 12466, 15104, 15104, 15104, 15104, 15104,
15104, 15104, 15104, 15104, 15104, 15104, 15104, 15000, 15099,
15099, 15099, 15099, 15099, 15099, 15099, 15099, 15099, 15099,
15099, 15099, 15099, 15099, 15099, 15099, 15099, 15099, 15099,
15099, 15099, 15099, 14546, 14546, 14546, 14546, 14546, 15349,
15349, 15349, 14729, 14729, 14729, 15101, 15101, 15101, 15101,
15101, 15101, 15101, 15101, 15101, 15101, 15185, 15185, 15185,
15185, 15185, 15185, 15185, 15185, 10435, 10435, 10435, 10435,
10435, 10435, 10435, 15319, 15319, 15319, 15319, 15319, 15319,
15319, 15319, 15319, 15319, 15319, 15319, 15319, 15319, 15319,
15319, 15319, 15319, 15319, 15842, 15842, 15842, 15352, 15352,
15352, 15352, 15352, 15352, 15352, 15352, 12173, 10576, 10426,
11971, 15276, 15083, 15209, 15181, 15176, 15204, 15239, 15597,
15184, 15149, 15093, 15162, 10916, 15175, 13380, 15246, 15206,
14859, 12304, 12074, 15174, 13467, 12633, 13307, 10414, 10456,
15170, 15173, 15172, 15187, 15201, 16160, 15171, 11640, 12814,
16013, 10552, 15255, 14834, 14525, 15285, 15286, 15163, 15169,
15268, 15202, 14999, 15264, 15166, 15377, 15211, 14167, 15203,
15210, 12153, 15299, 15299, 15299, 15299, 15299, 15299, 15299,
15299, 15299, 15299, 15299, 15299, 15299, 15299, 15299, 15299,
15299, 15299, 15299, 15299, NA, NA)), .Names = c("Order.Id",
"ProductID", "UnitPrice", "UnitCost", "Quantity", "Order.Date",
"TotalUnitPrice", "Quarter", "TotalUnitCost", "GrossMargin",
"Customer.Id"), row.names = c(NA, 29487L), class = "data.frame")
is the output of one window console
Upvotes: 0
Views: 5337
Reputation: 81693
First, transform the %
strings to numeric values:
df[2] <- as.numeric(gsub("%", "", as.character(df[ , 2])))
Calculate avergae GrossMargin
for each Quarter
:
dat <- aggregate(GrossMargin ~ Quarter, df, mean)
Plot:
plot(as.factor(dat$Quarter), dat$GrossMargin)
This answer was first posted in reply to this question.
Upvotes: 0
Reputation: 14370
Here you go using data.table
package for the averaging by
.
Untested as you did not provide reprodicible data
library(data.table)
dt = as.data.table(df)
plotData = dt[,list(MarginAvg=mean(GrossMargin)),by=Quarter]
qplot(plotData$Quarter, plotData$MarginAvg)
Example:
dt = data.table(Quarter=c(1,1,2,3),GrossMargin=c(.176,.178,.185,.216))
plotData = dt[,list(MarginAvg=mean(GrossMargin)),by=Quarter]
plotData
Quarter MarginAvg
1: 1 0.177
2: 2 0.185
3: 3 0.216
plot(plotData$Quarter, plotData$MarginAvg) #just a plot
Upvotes: 1