Reputation: 1
I have lab records of 30,000 unique ID's. I need to convert my data from long to wider format for each ID and TEST_DATE related to that unique ID.
I need to convert this to a wider format like this:
I have a dataset with 30,000 ID's and I need to do this for each ID. The ID with the maximum number of tests will determine our number of columns.
I will appreciate any ideas that you might have to solve this problem! Thank you
Upvotes: 0
Views: 79
Reputation: 26218
Take this simple route
library(tidyverse)
df %>% group_by(ACCT, TEST_DATE) %>% summarise(RESULT = mean(RESULT)) %>%
group_by(ACCT) %>% mutate(testno = row_number(), resultno = row_number()) %>%
pivot_wider(id_cols = ACCT, names_from = c("testno", "resultno"), values_from = c(TEST_DATE, RESULT))
# A tibble: 2 x 9
# Groups: ACCT [2]
ACCT TEST_DATE_1_1 TEST_DATE_2_2 TEST_DATE_3_3 TEST_DATE_4_4 RESULT_1_1 RESULT_2_2 RESULT_3_3 RESULT_4_4
<int> <date> <date> <date> <date> <dbl> <dbl> <dbl> <dbl>
1 37733 2016-01-07 2016-01-09 2016-01-11 2016-08-10 5 4.5 1 2
2 37734 2016-01-21 2016-08-20 NA NA 3 4 NA NA
data (dput) used
> dput(df)
structure(list(ACCT = c(37733L, 37733L, 37733L, 37733L, 37734L,
37734L, 37733L), TEST_DATE = structure(c(16809, 17023, 16811,
16807, 17033, 16821, 16809), class = "Date"), RESULT = c(3L,
2L, 1L, 5L, 4L, 3L, 6L)), row.names = c(NA, -7L), class = "data.frame")
df
> df
ACCT TEST_DATE RESULT
1 37733 2016-01-09 3
2 37733 2016-08-10 2
3 37733 2016-01-11 1
4 37733 2016-01-07 5
5 37734 2016-08-20 4
6 37734 2016-01-21 3
7 37733 2016-01-09 6
Upvotes: 0
Reputation: 101257
Here is a data.table
option with dcast
that might help (borrow data from @Duck)
> dcast(setDT(df)[, Q := seq(.N), ACCT], ACCT ~ Q, value.var = c("TEST_DATE", "RESULT"))
ACCT TEST_DATE_1 TEST_DATE_2 TEST_DATE_3 RESULT_1 RESULT_2 RESULT_3
1: 37733 9/1/2016 10/18/2016 11/1/2016 3 2 1
2: 37734 9/1/2016 10/18/2016 11/1/2016 5 4 3
Another option is using melt
along with dcast
, where the resulting format might be the one you are exactly after
suppressWarnings({
type.convert(
dcast(
melt(
setDT(df)[, Q := seq(.N), ACCT],
id = c("ACCT", "Q"),
measure = c("TEST_DATE", "RESULT")
)[order(ACCT, Q)],
ACCT ~ Q + variable,
value.var = "value"
),
as.is = TRUE
)
})
which gives
ACCT 1_TEST_DATE 1_RESULT 2_TEST_DATE 2_RESULT 3_TEST_DATE 3_RESULT
1: 37733 9/1/2016 3 10/18/2016 2 11/1/2016 1
2: 37734 9/1/2016 5 10/18/2016 4 11/1/2016 3
Upvotes: 0
Reputation: 39595
Try this:
library(dplyr)
library(tidyr)
#Code
new <- df %>%
group_by(ACCT,TEST_DATE) %>%
summarise(RESULT=round(mean(RESULT,na.rm=T),2)) %>%
ungroup() %>%
mutate(across(-ACCT,~as.character(.))) %>%
pivot_longer(-ACCT) %>%
group_by(ACCT,name) %>%
mutate(name=paste0(name,row_number())) %>%
pivot_wider(names_from = name,values_from=value) %>%
mutate(across(starts_with('RESULT'),~as.numeric(.)))
Output:
# A tibble: 2 x 7
# Groups: ACCT [2]
ACCT TEST_DATE1 RESULT1 TEST_DATE2 RESULT2 TEST_DATE3 RESULT3
<int> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 37733 9/1/2016 3 10/18/2016 2 11/1/2016 1
2 37734 9/1/2016 5 10/18/2016 4 11/1/2016 3
Some data used:
#Data
df <- structure(list(ACCT = c(37733L, 37733L, 37733L, 37734L, 37734L,
37734L), TEST_DATE = c("9/1/2016", "10/18/2016", "11/1/2016",
"9/1/2016", "10/18/2016", "11/1/2016"), RESULT = c(3L, 2L, 1L,
5L, 4L, 3L)), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 2