theblackbox
theblackbox

Reputation: 23

R: Long to wide without time

I am working with a medication prescription dataset which I want to transfer from long to wide format. I tried to use the reshape function, however, this requires a time variable, which I don't have (at least not in a useful format I believe).

Concept dataset:

id <- c(1, 1, 1, 2, 2, 3, 3, 3)
prescription_date <- c("17JAN2009", "02MAR2009", "20MAR2009", "05JUL2009", "10APR2009", "09MAY2009", "13JUN2009", "29MAY2009")
med <- c("A", "B", "A", "B", "A", "B", "A", "B")
df <- data.frame(id, prescription_date, med)

To make a time variable I have tried to make a time variable like 1st, 2nd, etc med per id, but I didn't succeed.

Background: I want this in a wide format to eventually create definitions for diagnoses (i.e. if a patient had >1 prescriptions of A, diagnosis is confirmed). This has to be combined with factors from other datasets, hence the idea to go from long to wide.

Any help is much appreciated, thank you.

Upvotes: 1

Views: 49

Answers (1)

Jon Spring
Jon Spring

Reputation: 66490

You might consider keeping the data in long format to perform some of these calculations. I would also suggest changing your dates into a date format that can be calculated upon. This will show, for instance, that the last two rows are not chronological. For instance:

library(dplyr)
df %>%
  mutate(prescription_date = lubridate::dmy(prescription_date)) %>%
  arrange(id, prescription_date) %>%
  group_by(id) %>%
  mutate(A_cuml = cumsum(med=="A"),
         A_ttl = sum(med=="A")) %>%
  ungroup()


# A tibble: 8 × 5
     id prescription_date med   A_cuml A_ttl
  <dbl> <date>            <chr>  <int> <int>
1     1 2009-01-17        A          1     2
2     1 2009-03-02        B          1     2
3     1 2009-03-20        A          2     2
4     2 2009-04-10        A          1     1
5     2 2009-07-05        B          1     1
6     3 2009-05-09        B          0     1
7     3 2009-05-29        B          0     1
8     3 2009-06-13        A          1     1

If you calculate summary stats for each id, you might save this in a summarized table with one row per id and use joins (e.g. left_join) to append the results of each of these summaries.

Upvotes: 1

Related Questions