How can I reshape data from long to wide

Question

** Sample data added after comment**

What I have:

pmts <- data.frame(stringsAsFactors=FALSE,
           name = c("johndoe", "johndoe", "janedoe", "foo", "foo", "foo"),
           pmt_amount = c(550L, 550L, 995L, 375L, 375L, 375L),
           pmt_date = c("9/1/16", "11/1/16", "12/15/16", "1/5/17", "3/5/17", "5/5/17")
)

#>      name pmt_amount pmt_date
#> 1 johndoe        550   9/1/16
#> 2 johndoe        550  11/1/16
#> 3 janedoe        995 12/15/16
#> 4     foo        375   1/5/17
#> 5     foo        375   3/5/17
#> 6     foo        375   5/5/17

What I am looking to achieve:

read.table(header = T, text = 
"name    pmt_amount  first_pmt   second_pmt  third_pmt
johndoe    550        9/1/16       11/1/16    NA
  janedoe    995        12/15/16       NA       NA
  foo       375        1/5/17       3/5/17   5/5/17"
)

#>      name pmt_amount first_pmt second_pmt third_pmt
#> 1 johndoe        550    9/1/16    11/1/16      
#> 2 janedoe        995  12/15/16             
#> 3     foo        375    1/5/17     3/5/17    5/5/17

** End of update**

I have a large dataset with payment information for different products. Some of these products have a pay-in-full option as well as a two-pay and three-pay option. I need to create fields that would be First_Payment, Second_Payment, and Third_Payment and would populate NA in the respective fields if there was only one or two payments.

I've tried a couple options and the best workaround I have thus far is this:

pmts %>%
  group_by(Email, Name, Amount, Form.Title) %>%
  summarise(First_Payment = min(Payment.Date),
           Second_Payment = median(Payment.Date),
           Last_Payment = max(Payment.Date)) -> pmts

This obviously is not ideal as is making up a payment date for the 2-pay plans and I would have to instruct the end-user to ignore this field and just look at the 1st and 3rd fields.

I also tried to summarise with partial sorts like this:

n <- length(pmts$Payment.Date)
sort(pmts$Payment.Date,partial=n-1)[n-1]

However, if there wasn't three payments for the person, it would take the n-1 date from the entire data set and apply to all other fields.

Ideally, I would have it so if it was a pay-in-full the the First_Payment field would have the date and the 2nd/3rd fields would say NA. The 2-pay would have 1st and 2nd dates and the 3rd field would say NA. And finally the 3 pay would have all 3 dates.

The end users here are not super data savvy so I'm trying to make this as easy to interpret as possible. Any suggestions would be tremendously appreciated. Thank you!

David Arenburg · Accepted Answer

Using data.table this is a simple one-liner

library(data.table) #v1.9.8+
dcast(setDT(pmts), name + pmt_amount ~ rowid(pmt_amount))
# Using 'pmt_date' as value column. Use 'value.var' to override
#       name pmt_amount        1       2      3
# 1:     foo        375   1/5/17  3/5/17 5/5/17
# 2: janedoe        995 12/15/16      NA     NA
# 3: johndoe        550   9/1/16 11/1/16     NA

dcast converts from long to wide and it accepts expressions. rowid is just adding a row counter per pmt_amount.

How can I reshape data from long to wide

Answers (2)

Related Questions