user2724207
user2724207

Reputation:

tranpose specific columns in data frame

I have dataset that looks like below:

customerid  product store   Date    Sales
1   A   X1  1/2/2013    4
1   B   x2  1/9/2013    4
1   A   x2  1/9/2013    4
1   C   x1  1/16/2013   2
1   B   x1  1/16/2013   2
1   A   x1  1/16/2013   4
2   A   x1  1/23/2013   2
2   B   x2  1/30/2013   2
2   C   x1  2/6/2013    2
2   D   x3  2/13/2013   4

I need to transpose by Product, such that all products appear as columns..like below..

customerid  Date    Store   A   B   C   D
1   1/2/2013    x1  4           
1   1/9/2013    x2  4   4       
1   1/16/2013   x1  4   2   2   
2   1/23/2013   x1  2           
2   1/30/2013   x2      2       
2   2/6/2013    x1          2   
2   2/13/2013   x3              4

Please help! I am trying to work with the transpose function, and i tried unsuccesfully to read through some threads here, but to no avail

Thanks!

Upvotes: 1

Views: 153

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can use dcast from "reshape2".

library(reshape2)
dcast(mydf, customerid + store + Date ~ product, value.var="Sales")
#   customerid store      Date  A  B  C  D
# 1          1    x1 1/16/2013  4  2  2 NA
# 2          1    X1  1/2/2013  4 NA NA NA
# 3          1    x2  1/9/2013  4  4 NA NA
# 4          2    x1 1/23/2013  2 NA NA NA
# 5          2    x1  2/6/2013 NA NA  2 NA
# 6          2    x2 1/30/2013 NA  2 NA NA
# 7          2    x3 2/13/2013 NA NA NA  4

If you wanted to have "" instead of NA, you can do that too, but note that you would have coerced those columns to character.

dcast(mydf, customerid + store + Date ~ product, value.var="Sales", fill="")
#   customerid store      Date A B C D
# 1          1    x1 1/16/2013 4 2 2  
# 2          1    X1  1/2/2013 4      
# 3          1    x2  1/9/2013 4 4    
# 4          2    x1 1/23/2013 2      
# 5          2    x1  2/6/2013     2  
# 6          2    x2 1/30/2013   2    
# 7          2    x3 2/13/2013       4

For a base R solution, you can use reshape():

reshape(mydf, direction = "wide", 
        idvar = c("customerid", "store", "Date"), 
        timevar = "product")
#    customerid store      Date Sales.A Sales.B Sales.C Sales.D
# 1           1    X1  1/2/2013       4      NA      NA      NA
# 2           1    x2  1/9/2013       4       4      NA      NA
# 4           1    x1 1/16/2013       4       2       2      NA
# 7           2    x1 1/23/2013       2      NA      NA      NA
# 8           2    x2 1/30/2013      NA       2      NA      NA
# 9           2    x1  2/6/2013      NA      NA       2      NA
# 10          2    x3 2/13/2013      NA      NA      NA       4

Another possibility is to use model.matrix (thanks @Thomas for explaining the model.matrix approach in a recent Q & A):

# cbind(mydf, model.matrix(~ 0 + product, data = mydf) * mydf$Sales)
#    customerid product store      Date Sales productA productB productC productD
# 1           1       A    X1  1/2/2013     4        4        0        0        0
# 2           1       B    x2  1/9/2013     4        0        4        0        0
# 3           1       A    x2  1/9/2013     4        4        0        0        0
# 4           1       C    x1 1/16/2013     2        0        0        2        0
# 5           1       B    x1 1/16/2013     2        0        2        0        0
# 6           1       A    x1 1/16/2013     4        4        0        0        0
# 7           2       A    x1 1/23/2013     2        2        0        0        0
# 8           2       B    x2 1/30/2013     2        0        2        0        0
# 9           2       C    x1  2/6/2013     2        0        0        2        0
# 10          2       D    x3 2/13/2013     4        0        0        0        4

Upvotes: 1

Related Questions