Reputation: 21
I have row-wise transaction data in the following long format
TransactionId ItemCode
101 123
101 521
102 423
103 871
103 982
103 131
I want to reshape this to get all items in one transaction in one row, so the new reshaped table looks like this (wide)
TransactionID ItemCode(s)
101 123 521
102 423
103 871 982 131
I have tried melting and casting in R, but i am a bit confused about the how the methods work. Also my dataset is huge, it has 30M rows. How should I go about this process?
Upvotes: -2
Views: 320
Reputation: 7455
You can use the plyr
package to accomplish what you want. Assuming your data is in the data frame products_sample
, you can use ddply
with spread
from the tidyr
package. This will put the different ItemCodes
from each TransactionId
into different columns.
library(plyr)
library(tidyr)
result <- ddply(products_sample, "TransactionId", spread, ItemCode, ItemCode)
Using the data you provided:
TransactionId ItemCode
1 101 123
2 101 521
3 102 423
4 103 871
5 103 982
6 103 123
This gives
print(result)
## TransactionId 123 521 423 131 871 982
##1 101 123 521 NA NA NA NA
##2 102 NA NA 423 NA NA NA
##3 103 NA NA NA 131 871 982
The last two arguments to ddply
are the key
and value
arguments passed to spread
. The key
identifies the column in products_sample
whose values are to be used as the column names. The value
identifies the column in products_sample
whose values will populate the cells. See ?spread
for details.
Upvotes: 1