Aby
Aby

Reputation: 167

Convert dates into columns by date ranges and mention exact date for particular ID in R

this data has two columns. One with participant IDs and second with dates. I want to convert the dates to columns and within each column I want exact date of that ID. e.g. t1 march to may 2020, t2 June to sep, t3 oct to dec 2020, t4 jan to feb 2021, t5 march to June 2021. This is a dummy code as in my real data, I have ID as character variable rather than numerical eg CASK-01. Say CASK-01 is repeated six times with different dates, so I want that t1 column has that particular which falls into its range and so on.

n <- 6 #in actual data frequencies range from 2-8 or IDs are repeated 2-8 times rather than a fixed number
df <- data.frame(id=rep(1:500, each=n), 
                  date=sample(seq(as.Date('2020/03/01'), as.Date('2021/07/30'), by="day"), 12))

Here is the structure of my real data:

str(df)

tibble [4,347 × 2] (S3: tbl_df/tbl/data.frame)
 $ id        : chr [1:4347] "CI-001" "CI-002" "CI-003" "CI-004" ...
 $ sampledate: POSIXct[1:4347], format: "2020-03-06" "2020-03-06" "2020-03-06" "2020-03-06" ...

I tried to find relevant code but could not find it so far. I could do it manually in excel, in fact I already did that by sorting and then copy-pasting exact date there but I want to do that in R. What do you think? I have posted a dummy result picture as well. Basically first column will be participant id and next five columns are for time, i.e. t1-5 with date ranges. t1 march to may 2020, t2 June to sep, t3 oct to dec 2020, t4 jan to feb 2021, t5 march to June 2021. So dates linked to each participant ID will go into respective time column, t1-5. E.g., if cas-03 has dates: 5 feb 2021 (it’ll show up in t4 column), and 7 and 9 March 2020 (both in t1 but in two row). At the moment each ID is repeated as rows for new linked-date but new table will have IDs repeated in rows only if there’re multiple linked dates in a time column date range. Is it too complicated to do in R. What do think?

This is how it should look

Recently I tried this code but it said: Error in filter(., id <= 3) : object 'id' not found In addition: Warning message: In data.matrix(data) : NAs introduced by coercion

# change to your df to have different rep per IDs
df <- df %>% 
  filter(id <= 3) %>% 
  mutate(id = ifelse(id == 3, 2, id))

df <- df %>% 
  group_by(id) %>% 
  mutate(t = paste0("t", row_number())) %>% 
  arrange(t) %>% 
  ungroup()

df %>% 
  pivot_wider(names_from = t, values_from = date) %>% 
  head()

Upvotes: 0

Views: 247

Answers (1)

yeahman269
yeahman269

Reputation: 794

Well assuming you have exactly the same number of observations per ID (which is the case in your example since df %>% group_by(id) %>% summarise(count = n()) %>% pull(count) %>% unique() = 6) I would advise you to use the tidyr package as follow:

library(tidyr)

n <- 6
df <- data.frame(id=rep(1:500, each=n), 
                 date=sample(seq(as.Date('2020/03/01'), as.Date('2021/07/30'), by="day"), 12))
df %>% group_by(id) %>% summarise(count = n()) %>% pull(count) %>% unique()

df <- df %>% 
  group_by(id) %>% 
  mutate(t = paste0("t", row_number())) %>% 
  arrange(t)

df %>% 
  pivot_wider(names_from = t, values_from = date)

Have a look here: https://tidyr.tidyverse.org/articles/pivot.html

EDIT to your comment

If you do not have the same number of repetition per IDs, that's no big deal. pivot_wider() is smart enought and will add NA values in the output, example:

# change to your df to have different rep per IDs
df <- df %>% 
  filter(id <= 3) %>% 
  mutate(id = ifelse(id == 3, 2, id))

df <- df %>% 
  group_by(id) %>% 
  mutate(t = paste0("t", row_number())) %>% 
  arrange(t) %>% 
  ungroup()

df %>% 
  pivot_wider(names_from = t, values_from = date) %>% 
  head()

will return

# A tibble: 2 x 13
     id t1         t10        t11        t12        t2         t3         t4         t5         t6         t7         t8        
  <dbl> <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>    
1     1 2021-04-06 NA         NA         NA         2020-10-16 2020-03-17 2021-03-20 2020-11-04 2021-03-22 NA         NA        
2     2 2021-07-26 2021-03-20 2020-11-04 2021-03-22 2021-07-02 2020-03-09 2020-05-13 2021-04-23 2020-04-09 2021-04-06 2020-10-16
# ... with 1 more variable: t9 <date>

Upvotes: 0

Related Questions