Reputation: 197
Introduction: I work in a strict environment where I can not install any more packages. I have access to dcast(), xtabs(), reshape() from {stats}. I don't have access to tydr, reshape nor reshape2 packages.
Now to the problem: I have a summarized data frame with 4 columns names cust_id, merchant_group, sum and max that looks like:
cust_id merchant_group sum max
<int> <chr> <dbl> <dbl>
1 495 AIRLINE 45493 4950
2 495 AUTO RENTAL 3104 1000
3 495 CLOTHING STORES 20928 3140
4 495 DEPARTMENT STORES 1082 495
5 495 DRUG STORES 482 165
I want to reshape it into wide form that will look like:
cust_id AIRLINE AUTO RENTAL CLOTHING STORES DEPARTMENT STORES DRUG STORES
495 45493 3104 20928 1082 482
495 4950 1000 3140 495 165
I have tried functions such as:
xtabs(sum~cust_id+merchant_group, data=my.data)
reshape(my.data, idvar = "cust_id", timevar = "merchant_group", direction = "wide")
But doesn't solve my problem. Thank you in advance for your time.
Upvotes: 0
Views: 57
Reputation: 6813
If you have to use stats::reshape()
, you can
(1) reshape the data to a longer format, in which both sum
and max
are in one column:
my.data.longer <- stats::reshape(data = my.data,
idvar = 1:2,
v.names = "value",
timevar = "variable",
times = c("sum", "max"),
varying = 3:4,
direction = "long")
Which looks like this (don't worry about the row.names for now):
cust_id merchant_group variable value
495.AIRLINE.sum 495 AIRLINE sum 45493
495.AUTO RENTAL.sum 495 AUTO RENTAL sum 3104
495.CLOTHING STORES.sum 495 CLOTHING STORES sum 20928
495.DEPARTMENT STORES.sum 495 DEPARTMENT STORES sum 1082
495.DRUG STORES.sum 495 DRUG STORES sum 482
495.AIRLINE.max 495 AIRLINE max 4950
495.AUTO RENTAL.max 495 AUTO RENTAL max 1000
495.CLOTHING STORES.max 495 CLOTHING STORES max 3140
495.DEPARTMENT STORES.max 495 DEPARTMENT STORES max 495
495.DRUG STORES.max 495 DRUG STORES max 165
(2) reshape the longer data to the wide format you need:
my.data.wide <- stats::reshape(data = my.data.longer,
idvar = c("cust_id", "variable"),
timevar = "merchant_group",
times = as.character(my.data$merchant_group),
v.names = "value",
direction = "wide")
Which looks like this:
cust_id variable value.AIRLINE value.AUTO RENTAL value.CLOTHING STORES value.DEPARTMENT STORES value.DRUG STORES
495.AIRLINE.sum 495 sum 45493 3104 20928 1082 482
495.AIRLINE.max 495 max 4950 1000 3140 495 165
(3) Delete the variable
column, change the column names
and reset the row.names
:
my.data.wide$variable <- NULL
names(my.data.wide)[2:ncol(my.data.wide)] <- as.character(my.data$merchant_group)
row.names(my.data.wide) <- NULL
my.data.wide
The result is:
cust_id AIRLINE AUTO RENTAL CLOTHING STORES DEPARTMENT STORES DRUG STORES
1 495 45493 3104 20928 1082 482
2 495 4950 1000 3140 495 165
Upvotes: 0