techKid-Rinshad
techKid-Rinshad

Reputation: 197

convert summarized data frame from long to wide (without using reshape, reshape2, tydr)

Introduction: I work in a strict environment where I can not install any more packages. I have access to dcast(), xtabs(), reshape() from {stats}. I don't have access to tydr, reshape nor reshape2 packages.

Now to the problem: I have a summarized data frame with 4 columns names cust_id, merchant_group, sum and max that looks like:

       cust_id merchant_group          sum   max
         <int> <chr>                  <dbl> <dbl>
 1         495 AIRLINE               45493 4950 
 2         495 AUTO RENTAL            3104 1000 
 3         495 CLOTHING STORES       20928 3140 
 4         495 DEPARTMENT STORES      1082  495
 5         495 DRUG STORES             482  165

I want to reshape it into wide form that will look like:

cust_id AIRLINE AUTO RENTAL CLOTHING STORES DEPARTMENT  STORES DRUG STORES
  495   45493   3104        20928           1082               482  
  495   4950    1000        3140            495                165

I have tried functions such as:

xtabs(sum~cust_id+merchant_group, data=my.data)

reshape(my.data, idvar = "cust_id", timevar = "merchant_group", direction = "wide")

But doesn't solve my problem. Thank you in advance for your time.

Upvotes: 0

Views: 57

Answers (1)

clemens
clemens

Reputation: 6813

If you have to use stats::reshape(), you can

(1) reshape the data to a longer format, in which both sum and max are in one column:

my.data.longer <- stats::reshape(data = my.data,
                                 idvar = 1:2,
                                 v.names = "value",
                                 timevar = "variable",
                                 times = c("sum", "max"),
                                 varying = 3:4,
                                 direction = "long")

Which looks like this (don't worry about the row.names for now):

                          cust_id    merchant_group variable value
495.AIRLINE.sum               495           AIRLINE      sum 45493
495.AUTO RENTAL.sum           495       AUTO RENTAL      sum  3104
495.CLOTHING STORES.sum       495   CLOTHING STORES      sum 20928
495.DEPARTMENT STORES.sum     495 DEPARTMENT STORES      sum  1082
495.DRUG STORES.sum           495       DRUG STORES      sum   482
495.AIRLINE.max               495           AIRLINE      max  4950
495.AUTO RENTAL.max           495       AUTO RENTAL      max  1000
495.CLOTHING STORES.max       495   CLOTHING STORES      max  3140
495.DEPARTMENT STORES.max     495 DEPARTMENT STORES      max   495
495.DRUG STORES.max           495       DRUG STORES      max   165

(2) reshape the longer data to the wide format you need:

my.data.wide <- stats::reshape(data = my.data.longer,
                               idvar = c("cust_id", "variable"),
                               timevar = "merchant_group",
                               times = as.character(my.data$merchant_group),
                               v.names = "value",
                               direction = "wide")

Which looks like this:

                cust_id variable value.AIRLINE value.AUTO RENTAL value.CLOTHING STORES value.DEPARTMENT STORES value.DRUG STORES
495.AIRLINE.sum     495      sum         45493              3104                 20928                    1082               482
495.AIRLINE.max     495      max          4950              1000                  3140                     495               165

(3) Delete the variable column, change the column names and reset the row.names:

my.data.wide$variable <- NULL
names(my.data.wide)[2:ncol(my.data.wide)] <- as.character(my.data$merchant_group)
row.names(my.data.wide) <- NULL
my.data.wide

The result is:

  cust_id AIRLINE AUTO RENTAL CLOTHING STORES DEPARTMENT STORES DRUG STORES
1     495   45493        3104           20928              1082         482
2     495    4950        1000            3140               495         165

Upvotes: 0

Related Questions