Meghna
Meghna

Reputation: 1

Long to wider format

I have lab records of 30,000 unique ID's. I need to convert my data from long to wider format for each ID and TEST_DATE related to that unique ID.

Example for one ID : enter image description here

I need to convert this to a wider format like this:

enter image description here

I have a dataset with 30,000 ID's and I need to do this for each ID. The ID with the maximum number of tests will determine our number of columns.

I will appreciate any ideas that you might have to solve this problem! Thank you

Upvotes: 0

Views: 79

Answers (3)

AnilGoyal
AnilGoyal

Reputation: 26218

Take this simple route

library(tidyverse)

df %>% group_by(ACCT, TEST_DATE) %>% summarise(RESULT = mean(RESULT)) %>%
  group_by(ACCT) %>% mutate(testno = row_number(), resultno = row_number()) %>%
  pivot_wider(id_cols = ACCT, names_from = c("testno", "resultno"), values_from = c(TEST_DATE, RESULT))

# A tibble: 2 x 9
# Groups:   ACCT [2]
   ACCT TEST_DATE_1_1 TEST_DATE_2_2 TEST_DATE_3_3 TEST_DATE_4_4 RESULT_1_1 RESULT_2_2 RESULT_3_3 RESULT_4_4
  <int> <date>        <date>        <date>        <date>             <dbl>      <dbl>      <dbl>      <dbl>
1 37733 2016-01-07    2016-01-09    2016-01-11    2016-08-10             5        4.5          1          2
2 37734 2016-01-21    2016-08-20    NA            NA                     3        4           NA         NA

data (dput) used

> dput(df)
structure(list(ACCT = c(37733L, 37733L, 37733L, 37733L, 37734L, 
37734L, 37733L), TEST_DATE = structure(c(16809, 17023, 16811, 
16807, 17033, 16821, 16809), class = "Date"), RESULT = c(3L, 
2L, 1L, 5L, 4L, 3L, 6L)), row.names = c(NA, -7L), class = "data.frame")

df
> df
   ACCT  TEST_DATE RESULT
1 37733 2016-01-09      3
2 37733 2016-08-10      2
3 37733 2016-01-11      1
4 37733 2016-01-07      5
5 37734 2016-08-20      4
6 37734 2016-01-21      3
7 37733 2016-01-09      6

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101257

Here is a data.table option with dcast that might help (borrow data from @Duck)

> dcast(setDT(df)[, Q := seq(.N), ACCT], ACCT ~ Q, value.var = c("TEST_DATE", "RESULT"))
    ACCT TEST_DATE_1 TEST_DATE_2 TEST_DATE_3 RESULT_1 RESULT_2 RESULT_3
1: 37733    9/1/2016  10/18/2016   11/1/2016        3        2        1
2: 37734    9/1/2016  10/18/2016   11/1/2016        5        4        3

Another option is using melt along with dcast, where the resulting format might be the one you are exactly after

suppressWarnings({
  type.convert(
    dcast(
      melt(
        setDT(df)[, Q := seq(.N), ACCT],
        id = c("ACCT", "Q"),
        measure = c("TEST_DATE", "RESULT")
      )[order(ACCT, Q)],
      ACCT ~ Q + variable,
      value.var = "value"
    ),
    as.is = TRUE
  )
})

which gives

    ACCT 1_TEST_DATE 1_RESULT 2_TEST_DATE 2_RESULT 3_TEST_DATE 3_RESULT
1: 37733    9/1/2016        3  10/18/2016        2   11/1/2016        1
2: 37734    9/1/2016        5  10/18/2016        4   11/1/2016        3

Upvotes: 0

Duck
Duck

Reputation: 39595

Try this:

library(dplyr)
library(tidyr)
#Code
new <- df %>%
  group_by(ACCT,TEST_DATE) %>%
  summarise(RESULT=round(mean(RESULT,na.rm=T),2)) %>%
  ungroup() %>%
  mutate(across(-ACCT,~as.character(.))) %>%
  pivot_longer(-ACCT) %>%
  group_by(ACCT,name) %>%
  mutate(name=paste0(name,row_number())) %>%
  pivot_wider(names_from = name,values_from=value) %>%
  mutate(across(starts_with('RESULT'),~as.numeric(.)))

Output:

# A tibble: 2 x 7
# Groups:   ACCT [2]
   ACCT TEST_DATE1 RESULT1 TEST_DATE2 RESULT2 TEST_DATE3 RESULT3
  <int> <chr>        <dbl> <chr>        <dbl> <chr>        <dbl>
1 37733 9/1/2016         3 10/18/2016       2 11/1/2016        1
2 37734 9/1/2016         5 10/18/2016       4 11/1/2016        3

Some data used:

#Data
df <- structure(list(ACCT = c(37733L, 37733L, 37733L, 37734L, 37734L, 
37734L), TEST_DATE = c("9/1/2016", "10/18/2016", "11/1/2016", 
"9/1/2016", "10/18/2016", "11/1/2016"), RESULT = c(3L, 2L, 1L, 
5L, 4L, 3L)), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 2

Related Questions