Reputation: 23
I am working with a medication prescription dataset which I want to transfer from long to wide format.
I tried to use the reshape
function, however, this requires a time variable, which I don't have (at least not in a useful format I believe).
Concept dataset:
id <- c(1, 1, 1, 2, 2, 3, 3, 3)
prescription_date <- c("17JAN2009", "02MAR2009", "20MAR2009", "05JUL2009", "10APR2009", "09MAY2009", "13JUN2009", "29MAY2009")
med <- c("A", "B", "A", "B", "A", "B", "A", "B")
df <- data.frame(id, prescription_date, med)
To make a time variable I have tried to make a time variable like 1st, 2nd, etc med per id, but I didn't succeed.
Background: I want this in a wide format to eventually create definitions for diagnoses (i.e. if a patient had >1 prescriptions of A, diagnosis is confirmed). This has to be combined with factors from other datasets, hence the idea to go from long to wide.
Any help is much appreciated, thank you.
Upvotes: 1
Views: 49
Reputation: 66490
You might consider keeping the data in long format to perform some of these calculations. I would also suggest changing your dates into a date format that can be calculated upon. This will show, for instance, that the last two rows are not chronological. For instance:
library(dplyr)
df %>%
mutate(prescription_date = lubridate::dmy(prescription_date)) %>%
arrange(id, prescription_date) %>%
group_by(id) %>%
mutate(A_cuml = cumsum(med=="A"),
A_ttl = sum(med=="A")) %>%
ungroup()
# A tibble: 8 × 5
id prescription_date med A_cuml A_ttl
<dbl> <date> <chr> <int> <int>
1 1 2009-01-17 A 1 2
2 1 2009-03-02 B 1 2
3 1 2009-03-20 A 2 2
4 2 2009-04-10 A 1 1
5 2 2009-07-05 B 1 1
6 3 2009-05-09 B 0 1
7 3 2009-05-29 B 0 1
8 3 2009-06-13 A 1 1
If you calculate summary stats for each id
, you might save this in a summarized table with one row per id
and use joins (e.g. left_join
) to append the results of each of these summaries.
Upvotes: 1