Natasha R.
Natasha R.

Reputation: 531

How to combine three date columns in a data frame into a single variable?

I have a data frame that looks a bit like this:

   Type   Size `Jul-17` `Aug-17` `Sep-17`
1     A  Large       35       24       80
2     B Medium       81       13       38
3     C  Small       30       64       45
4     D  Large       97       68       65
5     E Medium       31       69       33
6     F  Small       84       74       12

I use the ddply function a lot, and instead of summing the three columns together like below...

result <- ddply(Example, .(Type), (summarize),
                Q3sum = sum(`Jul-17`, `Aug-17`, `Sep-17`))

I'd like to be able to reference a single variable that contains those three columns and call it "Q3". Is there a way to do this that will still allow the data to work with ddply? I've tried setting the three columns to a single variable using Q3<- c(`Jul-17`, `Aug-17`, `Sep-17`), but it doesn't seem to work.

Any suggestions would be greatly appreciated.

Upvotes: 0

Views: 79

Answers (3)

hrbrmstr
hrbrmstr

Reputation: 78832

Reproducible data frame:

read.table(check.names = FALSE, text="Type   Size Jul-17 Aug-17 Sep-17
A  Large       35       24       80
B Medium       81       13       38
C  Small       30       64       45
D  Large       97       68       65
E Medium       31       69       33
F  Small       84       74       12", header=TRUE, stringsAsFactors=FALSE) -> xdf

xdf
##   Type   Size Jul-17 Aug-17 Sep-17
## 1    A  Large     35     24     80
## 2    B Medium     81     13     38
## 3    C  Small     30     64     45
## 4    D  Large     97     68     65
## 5    E Medium     31     69     33
## 6    F  Small     84     74     12

If you just want the sum of the columns into one Q3 column:

xdf$Q3 <- rowSums(xdf[,3:5])

xdf
##   Type   Size Jul-17 Aug-17 Sep-17  Q3
## 1    A  Large     35     24     80 139
## 2    B Medium     81     13     38 132
## 3    C  Small     30     64     45 139
## 4    D  Large     97     68     65 230
## 5    E Medium     31     69     33 133
## 6    F  Small     84     74     12 170

If you want the 3 months making up "Q3" nested into one column:

xdf$q3_alt <- apply(xdf, 1, function(x) { list(as.numeric(x[3:5])) })

xdf
##   Type   Size Jul-17 Aug-17 Sep-17  Q3     q3_alt
## 1    A  Large     35     24     80 139 35, 24, 80
## 2    B Medium     81     13     38 132 81, 13, 38
## 3    C  Small     30     64     45 139 30, 64, 45
## 4    D  Large     97     68     65 230 97, 68, 65
## 5    E Medium     31     69     33 133 31, 69, 33
## 6    F  Small     84     74     12 170 84, 74, 12

str(xdf)
## 'data.frame':    6 obs. of  7 variables:
##  $ Type  : chr  "A" "B" "C" "D" ...
##  $ Size  : chr  "Large" "Medium" "Small" "Large" ...
##  $ Jul-17: int  35 81 30 97 31 84
##  $ Aug-17: int  24 13 64 68 69 74
##  $ Sep-17: int  80 38 45 65 33 12
##  $ Q3    : num  139 132 139 230 133 170
##  $ q3_alt:List of 6
##   ..$ :List of 1
##   .. ..$ : num  35 24 80
##   ..$ :List of 1
##   .. ..$ : num  81 13 38
##   ..$ :List of 1
##   .. ..$ : num  30 64 45
##   ..$ :List of 1
##   .. ..$ : num  97 68 65
##   ..$ :List of 1
##   .. ..$ : num  31 69 33
##   ..$ :List of 1
##   .. ..$ : num  84 74 12

Upvotes: 1

edoardo pedrini
edoardo pedrini

Reputation: 121

the solution is the gather function from tidyr. If you use dplyr you can make it in one line of code.

> library(dplyr)
> library(tidyr)
> df%>%
+   gather(key = Q3,value = values,Jul_17:Sep_17)
   type     size     Q3 values
1     1 A  Large Jul_17     35
2     2 B Medium Jul_17     81
3     3 C  Small Jul_17     30
4     4 D  Large Jul_17     97
5     5 E Medium Jul_17     31
6     6 F  Small Jul_17     84
7     1 A  Large Aug_17     24
8     2 B Medium Aug_17     13
9     3 C  Small Aug_17     64
10    4 D  Large Aug_17     68
11    5 E Medium Aug_17     69
12    6 F  Small Aug_17     74
13    1 A  Large Sep_17     80
14    2 B Medium Sep_17     38
15    3 C  Small Sep_17     45
16    4 D  Large Sep_17     65
17    5 E Medium Sep_17     33
18    6 F  Small Sep_17     12

Upvotes: 1

Andrew Milligan
Andrew Milligan

Reputation: 581

Sounds to me like you want something along the lines of melt from the reshape2 package or gather from the tidyr packge. They will make your data.frame longer with all the Jul-17, Aug-17, and Sep-17 values in one column and another column declaring which month each data point came from.

Check out this nice primer on data tidying.

Upvotes: 0

Related Questions