How to get plot of average value from R

Question

I have different quarters like

Quarter           GrossMargin
2009 Q1           17.60%

2009 Q1            17.80%

2010 Q2             18.50%

2011 Q1             21.60%

See like this i have big data set. I want to plot this in R. I used qplot(df$Quarter, df$GrossMargin, binwidth=.5)

It came dots at the data points of each occurance. Like in x-axis 2009 Q1 there are correspondingly two data points in Y-axis one at 17.6 and other at 17.8. But i want an average value for each quarter, like in x-axis 2009 Q1 should correspond to the single value in Y-axis (that is average of 17.6 and 17.8). If i try to do mean(GrossMargin), it gives mean of whole column, which is of no use. All my quarters(2009-Q1,Q2,Q3, 2010-Q1,Q2,Q3) are in one column and corresponding values of GrossMargin in other single column. How to do this in R

        Order.Id    ProductID   UnitPrice   UnitCost    Quantity    Order.Date  TotalUnitPrice  Quarter GrossMargin Customer.Id TotalUnitCost
1   24849   BDM10023    28.87   23.8    1   01-01-2009  28.87   2009 Q1 17.60%  10025   23.8
2   24849   1283484PMR29    77.84   64  2   01-01-2009  155.68  2009 Q1 17.80%  10025   128
3   24850   1283484SST30    25.78   20  4   01-02-2009  103.12  2009 Q1 22.40%  10096   80
4   24852   GFO20015    8.2 6.75    1   01-02-2009  8.2 2009 Q1 17.70%  10062   6.75
5   24852   OCM10018    8.24    6.8 2   01-02-2009  16.48   2009 Q1 17.50%  10062   13.6
6   24852   BDM10021    8.24    6.8 4   01-02-2009  32.96   2009 Q1 17.50%  10062   27.2
7   24852   POW20011    11.33   9.25    4   01-02-2009  45.32   2009 Q1 18.40%  10062   37
8   24852   AP6011  9.22    7.5 2   01-02-2009  18.44   2009 Q1 18.70%  10062   15
9   24852   POW30012    8.24    6.5 2   01-02-2009  16.48   2009 Q1 21.10%  10062   13
10  24852   PPF20017    11.86   10.25   1   01-02-2009  11.86   2009 Q1 13.60%  10062   10.25
11  24853   AP3008  8.2 6.75    1   01-02-2009  8.2 2009 Q1 17.70%  10030   6.75
12  24855   VEV10023    8.2 6.75    1   01-03-2009  8.2 2009 Q1 17.70%  10037   6.75
13  24855   AP6006  7.73    6.3 2   01-03-2009  15.46   2009 Q1 18.50%  10037   12.6
14  24855   AP5010  8.2 6.75    2   01-03-2009  16.4    2009 Q1 17.70%  10037   13.5
15  24856   1283484PMS30    7.21    5.9 2   01-03-2009  14.42   2009 Q1 18.20%  10078   11.8
16  24857   AP4009  7.16    5   2   01-03-2009  14.32   2009 Q1 30.20%  10032   10
17  24857   GFO10014    7.16    5.9 2   01-03-2009  14.32   2009 Q1 17.60%  10032   11.8
18  24858   AP3003  6.17    5   1   01-04-2009  6.17    2009 Q1 19.00%  10243   5
19  24858   OWW3009 10.25   8.75    1   01-04-2009  10.25   2009 Q1 14.60%  10243   8.75
20  24858   BDM10022    6.18    5.1 1   01-04-2009  6.18    2009 Q1 17.50%  10243   5.1
21  24858   AP2008  6.13    5   1   01-04-2009  6.13    2009 Q1 18.40%  10243   5
22  24858   AP5005  6.7 5.25    1   01-04-2009  6.7 2009 Q1 21.60%  10243   5.25
23  24859   POW30012    8.24    6.5 2   01-04-2009  16.48   2009 Q1 21.10%  10052   13
24  24860   POW20011    11.33   9.25    4   01-04-2009  45.32   2009 Q1 18.40%  10019   37
25  24861   POW10010    18.14   15  2   01-04-2009  36.28   2009 Q1 17.30%  13710   30
26  24861   OWW3009 10.25   8.75    1   01-04-2009  10.25   2009 Q1 14.60%  13710   8.75
27  24862   1283484CPN28    13.35   11  4   01-04-2009  53.4    2009 Q1 17.60%  15310   44

Paste this into excel. I have created this excel file using write command in R. There are about more than 10000 records in original file

  > str(df)
'data.frame':   29487 obs. of  11 variables:
 $ Order.Id      : num  24849 24849 24850 24852 24852 ...
 $ ProductID     : Factor w/ 42 levels "1202020SFB25",..: 24 4 7 29 31 22 36 19 37 39 ...
 $ UnitPrice     : num  28.87 77.84 25.78 8.2 8.24 ...
 $ UnitCost      : num  23.8 64 20 6.75 6.8 ...
 $ Quantity      : num  1 2 4 1 2 4 4 2 2 1 ...
 $ Order.Date    : Factor w/ 1261 levels "1/1/2009","1/1/2010",..: 1 1 45 45 45 45 45 45 45 45 ...
 $ TotalUnitPrice: num  28.9 155.7 103.1 8.2 16.5 ...
 $ Quarter       : chr  "2009 Q1" "2009 Q1" "2009 Q1" "2009 Q1" ...
 $ TotalUnitCost : num  23.8 128 80 6.75 13.6 ...
 $ GrossMargin   : chr  "17.6%" "17.8%" "22.4%" "17.7%" ...
 $ Customer.Id   : num  10025 10025 10096 10062 10062 ...
    > dput(df)

 15097, 15097, 12466, 12466, 15104, 15104, 15104, 15104, 15104, 
    15104, 15104, 15104, 15104, 15104, 15104, 15104, 15000, 15099, 
    15099, 15099, 15099, 15099, 15099, 15099, 15099, 15099, 15099, 
    15099, 15099, 15099, 15099, 15099, 15099, 15099, 15099, 15099, 
    15099, 15099, 15099, 14546, 14546, 14546, 14546, 14546, 15349, 
    15349, 15349, 14729, 14729, 14729, 15101, 15101, 15101, 15101, 
    15101, 15101, 15101, 15101, 15101, 15101, 15185, 15185, 15185, 
    15185, 15185, 15185, 15185, 15185, 10435, 10435, 10435, 10435, 
    10435, 10435, 10435, 15319, 15319, 15319, 15319, 15319, 15319, 
    15319, 15319, 15319, 15319, 15319, 15319, 15319, 15319, 15319, 
    15319, 15319, 15319, 15319, 15842, 15842, 15842, 15352, 15352, 
    15352, 15352, 15352, 15352, 15352, 15352, 12173, 10576, 10426, 
    11971, 15276, 15083, 15209, 15181, 15176, 15204, 15239, 15597, 
    15184, 15149, 15093, 15162, 10916, 15175, 13380, 15246, 15206, 
    14859, 12304, 12074, 15174, 13467, 12633, 13307, 10414, 10456, 
    15170, 15173, 15172, 15187, 15201, 16160, 15171, 11640, 12814, 
    16013, 10552, 15255, 14834, 14525, 15285, 15286, 15163, 15169, 
    15268, 15202, 14999, 15264, 15166, 15377, 15211, 14167, 15203, 
    15210, 12153, 15299, 15299, 15299, 15299, 15299, 15299, 15299, 
    15299, 15299, 15299, 15299, 15299, 15299, 15299, 15299, 15299, 
    15299, 15299, 15299, 15299, NA, NA)), .Names = c("Order.Id", 
"ProductID", "UnitPrice", "UnitCost", "Quantity", "Order.Date", 
"TotalUnitPrice", "Quarter", "TotalUnitCost", "GrossMargin", 
"Customer.Id"), row.names = c(NA, 29487L), class = "data.frame")

is the output of one window console

Sven Hohenstein · Accepted Answer

First, transform the % strings to numeric values:

df[2] <- as.numeric(gsub("%", "", as.character(df[ , 2])))

Calculate avergae GrossMargin for each Quarter:

dat <- aggregate(GrossMargin ~ Quarter, df, mean)

Plot:

plot(as.factor(dat$Quarter), dat$GrossMargin)

enter image description here

This answer was first posted in reply to this question.

How to get plot of average value from R

Answers (2)

Related Questions