Reputation: 33
I have a dataset DT
where each row represents the performance of an individual in a race:
personID | raceDate | raceID | finPos |
---|---|---|---|
person1 | 2009-08-14 | 489801 | 2 |
person1 | 2010-04-17 | 502397 | 6 |
person1 | 2011-03-10 | 524554 | 4 |
person2 | 2009-08-14 | 489801 | 1 |
person2 | 2011-03-10 | 524554 | 3 |
... | ... | ... | ... |
I want to transform the dataset so there is 1 row per person, and there are columns for their race performances in order of the race date (with NA values for individuals who didn't attend a specific race). For example:
personID | 489801 | finPos | 502397 | finPos | 524554 | finPos |
---|---|---|---|---|---|---|
person1 | 2009-08-14 | 2 | 2010-04-17 | 6 | 2011-03-10 | 4 |
person2 | 2009-08-14 | 1 | NA | NA | 2011-03-10 | 3 |
I know I can get part of the way there with reshape2 for example:
reshape(DT, direction = "wide", idvar = "raceID", timevar = "raceDate")
But is there a way to ensure that the combination of raceID/raceDat/finishPos stays together?
Upvotes: 0
Views: 339
Reputation: 66415
Here's a tidyr approach. The data pairs are together, but I have them swapped in order.
library(tidyr)
DT %>%
pivot_wider(id_cols = personID,
names_from = c(raceID),
names_glue = "{raceID}_{.value}",
values_from = c(raceDate, finPos)) %>%
select(personID, sort(colnames(.)))
## A tibble: 2 x 7
# personID `489801_finPos` `489801_raceDate` `502397_finPos` `502397_raceDate` `524554_finPos` `524554_raceDate`
# <chr> <int> <chr> <int> <chr> <int> <chr>
#1 person1 2 2009-08-14 6 2010-04-17 4 2011-03-10
#2 person2 1 2009-08-14 NA NA 3 2011-03-10
Upvotes: 1
Reputation: 27732
data.table
approach
library(data.table)
DT <- fread("personID raceDate raceID finPos
person1 2009-08-14 489801 2
person1 2010-04-17 502397 6
person1 2011-03-10 524554 4
person2 2009-08-14 489801 1
person2 2011-03-10 524554 3")
# Cast to wide
cols <- c("raceDate", "finPos") #value columns to cast
answer <- dcast(DT, personID ~ raceID, value.var = cols, drop = FALSE)
now the data is cast to wide, but is 'grouped' by value-variable. All you need to do now is to reorder the columns...
# Determine column order
new_col_order <- CJ( sort(unique(DT$raceID)), cols, sorted = FALSE)[, paste(cols, V1, sep = "_")]
# Set new column order
setcolorder(answer, c(setdiff(names(answer), new_col_order), new_col_order))
# personID raceDate_489801 finPos_489801 raceDate_502397 finPos_502397 raceDate_524554 finPos_524554
# 1: person1 2009-08-14 2 2010-04-17 6 2011-03-10 4
# 2: person2 2009-08-14 1 <NA> NA 2011-03-10 3
Upvotes: 1