SuhrudL
SuhrudL

Reputation: 21

R Reshaping Data from long to wide by aggregating on a variable

I have row-wise transaction data in the following long format

TransactionId ItemCode
101   123
101   521
102   423
103   871
103   982
103   131

I want to reshape this to get all items in one transaction in one row, so the new reshaped table looks like this (wide)

TransactionID ItemCode(s)
101   123   521
102   423
103   871   982   131

I have tried melting and casting in R, but i am a bit confused about the how the methods work. Also my dataset is huge, it has 30M rows. How should I go about this process?

Upvotes: -2

Views: 320

Answers (1)

aichao
aichao

Reputation: 7455

You can use the plyr package to accomplish what you want. Assuming your data is in the data frame products_sample, you can use ddply with spread from the tidyr package. This will put the different ItemCodes from each TransactionId into different columns.

library(plyr)
library(tidyr)
result <- ddply(products_sample, "TransactionId", spread, ItemCode, ItemCode)

Using the data you provided:

  TransactionId ItemCode
1           101      123
2           101      521
3           102      423
4           103      871
5           103      982
6           103      123

This gives

print(result)
##  TransactionId 123 521 423 131 871 982
##1           101 123 521  NA  NA  NA  NA
##2           102  NA  NA 423  NA  NA  NA
##3           103  NA  NA  NA 131 871 982

The last two arguments to ddply are the key and value arguments passed to spread. The key identifies the column in products_sample whose values are to be used as the column names. The value identifies the column in products_sample whose values will populate the cells. See ?spread for details.

Upvotes: 1

Related Questions