kommoder_Waran
kommoder_Waran

Reputation: 25

R: Data frame, for each row find cases with same value on a variable and add them to row with additional information (dplyr, pivot?)

I have a data frame in R that looks something like this:

event_ID date person_ID
1234 2000-08-09 5678
2345 2001-09-10 6789
1234 2000-08-09 7890
3456 2001-10-11 5678
... ... ...

For each person, I want to find all cases where another person has been at the same event as that person, which person it was and the date of the event. I want to add all cases of co-occurence with another person with their ID and date for each row/person_ID of the dataframe. Ideally, the result should look like this:

event_ID date person_ID 1st_person_met_ID 1st_person_met_date 2nd_person_met_ID 2nd_person_met_date ...
1234 2000-08-09 5678 7890 2000-08-09 ... ...

I've played around with dplyr and it seems like it should be able to create the result I want, but I haven't figured out exactly how to combine the tools. Any help is appreciated!

Upvotes: 0

Views: 384

Answers (1)

stefan
stefan

Reputation: 124343

With a left_join, some data wrangling and pivot_wider you could do:

Note: Not sure whether you want to keep observation where no other persons have been met. I kept these cases and assigned them an NA.

library(dplyr)
library(tidyr)

df %>% 
  left_join(df, by = c("event_ID"), suffix = c("", "_met")) %>% 
  rename(person_met_ID = person_ID_met, person_met_date = date_met) %>% 
  #### Deal with cases where no persons have been met
  add_count(event_ID, person_ID) %>% 
  filter(n < 2 | !person_ID == person_met_ID) %>%
  select(-n) %>% 
  mutate(person_met_ID = ifelse(!person_ID == person_met_ID, person_met_ID, NA),
         person_met_date = ifelse(!person_ID == person_met_ID, person_met_date, NA)) %>% 
  #### 
  group_by(event_ID, person_ID) %>% 
  mutate(id_met = scales::ordinal(row_number())) %>% 
  ungroup() %>% 
  pivot_wider(names_from = id_met, values_from = c(person_met_ID, person_met_date), names_glue = "{id_met}_{.value}")
#> # A tibble: 4 × 5
#>   event_ID date       person_ID `1st_person_met_ID` `1st_person_met_date`
#>      <int> <chr>          <int>               <int> <chr>                
#> 1     1234 2000-08-09      5678                7890 2000-08-09           
#> 2     2345 2001-09-10      6789                  NA <NA>                 
#> 3     1234 2000-08-09      7890                5678 2000-08-09           
#> 4     3456 2001-10-11      5678                  NA <NA>

Upvotes: 1

Related Questions