jazz_learn
jazz_learn

Reputation: 113

Looking to transpose a data frame on multiple columns in R

Here is a sample dataframe I have

 mydataf.tickersymbol mydataf.yr_qtr mydataf.act_mean
1                  ABC          20084               BB
2                  ABC          20091               BB
3                  ABC          20092               BB
4                  ABC          20093               BB
5                  DEF          20084               BB
6                  DEF          20091               BB
7                  DEF          20092               BB
8                  DEF          20093               BB
9                  DEF          20094               BB
10                 GEF          20092               BB
11                 GEF          20093               BB
12                 GEF          20094                M

Trying to get to output of by joining on yr_qtr for each ticker and caring for the missing data valued. I can go through and loop this and create the logic for all the checks but seems like through data frames I should be able to do this using r dataframe capabilities?

       20084    20091   20092   20093   20094
ABC     AA      BB      BB      BB      
DEF     BB      BB      BB      BB      BM
GEF                     BB      BB      M

Upvotes: 2

Views: 846

Answers (2)

Steven Beaupré
Steven Beaupré

Reputation: 21641

I would simply do:

library(tidyr)
df %>% spread(mydataf.yr_qtr, mydataf.act_mean)

Or using the reshape2 package:

library(reshape2)
dcast(df, mydataf.tickersymbol ~ mydataf.yr_qtr,
                        value.var = "mydataf.act_mean")

Which gives:

  mydataf.tickersymbol 20084 20091 20092 20093 20094
1                  ABC    BB    BB    BB    BB  <NA>
2                  DEF    BB    BB    BB    BB    BB
3                  GEF  <NA>  <NA>    BB    BB     M

Or if we need the "mydataf.tickersymbol" as row names, use acast which will give a matrix output and then convert back to data.frame with as.data.frame

library(reshape)#reshape2_1.4  
as.data.frame(acast(df, mydataf.tickersymbol~mydataf.yr_qtr, 
              value.var='mydataf.act_mean', fill=''))
#     20084 20091 20092 20093 20094
# ABC    BB    BB    BB    BB      
# DEF    BB    BB    BB    BB    BB
# GEF                BB    BB     M

Upvotes: 5

bgoldst
bgoldst

Reputation: 35324

It looks like you're trying to reshape from long format to wide, using 'mydataf.tickersymbol' as the key (aka idvar) and 'mydataf.yr_qtr' as the time variable (can be thought of as a discriminator column that distinguishes wide-format columns). However, the data in your question is slightly inconsistent, e.g. 20094/DEF should be BB rather than BM, and 20084/ABC should be BB rather than AA, no? In any case, I think this is what you're looking for:

df <- data.frame(mydataf.tickersymbol=c('ABC','ABC','ABC','ABC','DEF','DEF','DEF','DEF','DEF','GEF','GEF','GEF'), mydataf.yr_qtr=c(20084,20091,20092,20093,20084,20091,20092,20093,20094,20092,20093,20094), mydataf.act_mean=c('BB','BB','BB','BB','BB','BB','BB','BB','BB','BB','BB','M') );
wide <- reshape(df,dir='w',idvar='mydataf.tickersymbol',timevar='mydataf.yr_qtr');
names(wide) <- sub('^mydataf\\.act_mean\\.','',names(wide));
wide;
##    mydataf.tickersymbol 20084 20091 20092 20093 20094
## 1                   ABC    BB    BB    BB    BB  <NA>
## 5                   DEF    BB    BB    BB    BB    BB
## 10                  GEF  <NA>  <NA>    BB    BB     M

Upvotes: 3

Related Questions