Reputation: 531
I have a data frame that looks a bit like this:
Type Size `Jul-17` `Aug-17` `Sep-17`
1 A Large 35 24 80
2 B Medium 81 13 38
3 C Small 30 64 45
4 D Large 97 68 65
5 E Medium 31 69 33
6 F Small 84 74 12
I use the ddply function a lot, and instead of summing the three columns together like below...
result <- ddply(Example, .(Type), (summarize),
Q3sum = sum(`Jul-17`, `Aug-17`, `Sep-17`))
I'd like to be able to reference a single variable that contains those three columns and call it "Q3". Is there a way to do this that will still allow the data to work with ddply? I've tried setting the three columns to a single variable using Q3<- c(`Jul-17`, `Aug-17`, `Sep-17`)
, but it doesn't seem to work.
Any suggestions would be greatly appreciated.
Upvotes: 0
Views: 79
Reputation: 78832
Reproducible data frame:
read.table(check.names = FALSE, text="Type Size Jul-17 Aug-17 Sep-17
A Large 35 24 80
B Medium 81 13 38
C Small 30 64 45
D Large 97 68 65
E Medium 31 69 33
F Small 84 74 12", header=TRUE, stringsAsFactors=FALSE) -> xdf
xdf
## Type Size Jul-17 Aug-17 Sep-17
## 1 A Large 35 24 80
## 2 B Medium 81 13 38
## 3 C Small 30 64 45
## 4 D Large 97 68 65
## 5 E Medium 31 69 33
## 6 F Small 84 74 12
If you just want the sum of the columns into one Q3
column:
xdf$Q3 <- rowSums(xdf[,3:5])
xdf
## Type Size Jul-17 Aug-17 Sep-17 Q3
## 1 A Large 35 24 80 139
## 2 B Medium 81 13 38 132
## 3 C Small 30 64 45 139
## 4 D Large 97 68 65 230
## 5 E Medium 31 69 33 133
## 6 F Small 84 74 12 170
If you want the 3 months making up "Q3" nested into one column:
xdf$q3_alt <- apply(xdf, 1, function(x) { list(as.numeric(x[3:5])) })
xdf
## Type Size Jul-17 Aug-17 Sep-17 Q3 q3_alt
## 1 A Large 35 24 80 139 35, 24, 80
## 2 B Medium 81 13 38 132 81, 13, 38
## 3 C Small 30 64 45 139 30, 64, 45
## 4 D Large 97 68 65 230 97, 68, 65
## 5 E Medium 31 69 33 133 31, 69, 33
## 6 F Small 84 74 12 170 84, 74, 12
str(xdf)
## 'data.frame': 6 obs. of 7 variables:
## $ Type : chr "A" "B" "C" "D" ...
## $ Size : chr "Large" "Medium" "Small" "Large" ...
## $ Jul-17: int 35 81 30 97 31 84
## $ Aug-17: int 24 13 64 68 69 74
## $ Sep-17: int 80 38 45 65 33 12
## $ Q3 : num 139 132 139 230 133 170
## $ q3_alt:List of 6
## ..$ :List of 1
## .. ..$ : num 35 24 80
## ..$ :List of 1
## .. ..$ : num 81 13 38
## ..$ :List of 1
## .. ..$ : num 30 64 45
## ..$ :List of 1
## .. ..$ : num 97 68 65
## ..$ :List of 1
## .. ..$ : num 31 69 33
## ..$ :List of 1
## .. ..$ : num 84 74 12
Upvotes: 1
Reputation: 121
the solution is the gather function from tidyr. If you use dplyr you can make it in one line of code.
> library(dplyr)
> library(tidyr)
> df%>%
+ gather(key = Q3,value = values,Jul_17:Sep_17)
type size Q3 values
1 1 A Large Jul_17 35
2 2 B Medium Jul_17 81
3 3 C Small Jul_17 30
4 4 D Large Jul_17 97
5 5 E Medium Jul_17 31
6 6 F Small Jul_17 84
7 1 A Large Aug_17 24
8 2 B Medium Aug_17 13
9 3 C Small Aug_17 64
10 4 D Large Aug_17 68
11 5 E Medium Aug_17 69
12 6 F Small Aug_17 74
13 1 A Large Sep_17 80
14 2 B Medium Sep_17 38
15 3 C Small Sep_17 45
16 4 D Large Sep_17 65
17 5 E Medium Sep_17 33
18 6 F Small Sep_17 12
Upvotes: 1
Reputation: 581
Sounds to me like you want something along the lines of melt from the reshape2 package or gather from the tidyr packge. They will make your data.frame
longer with all the Jul-17
, Aug-17
, and Sep-17
values in one column and another column declaring which month each data point came from.
Check out this nice primer on data tidying.
Upvotes: 0