Reputation: 25
I have a data frame in R that looks something like this:
event_ID | date | person_ID |
---|---|---|
1234 | 2000-08-09 | 5678 |
2345 | 2001-09-10 | 6789 |
1234 | 2000-08-09 | 7890 |
3456 | 2001-10-11 | 5678 |
... | ... | ... |
For each person, I want to find all cases where another person has been at the same event as that person, which person it was and the date of the event. I want to add all cases of co-occurence with another person with their ID and date for each row/person_ID of the dataframe. Ideally, the result should look like this:
event_ID | date | person_ID | 1st_person_met_ID | 1st_person_met_date | 2nd_person_met_ID | 2nd_person_met_date | ... |
---|---|---|---|---|---|---|---|
1234 | 2000-08-09 | 5678 | 7890 | 2000-08-09 | ... | ... |
I've played around with dplyr and it seems like it should be able to create the result I want, but I haven't figured out exactly how to combine the tools. Any help is appreciated!
Upvotes: 0
Views: 384
Reputation: 124343
With a left_join
, some data wrangling and pivot_wider
you could do:
Note: Not sure whether you want to keep observation where no other persons have been met. I kept these cases and assigned them an NA.
library(dplyr)
library(tidyr)
df %>%
left_join(df, by = c("event_ID"), suffix = c("", "_met")) %>%
rename(person_met_ID = person_ID_met, person_met_date = date_met) %>%
#### Deal with cases where no persons have been met
add_count(event_ID, person_ID) %>%
filter(n < 2 | !person_ID == person_met_ID) %>%
select(-n) %>%
mutate(person_met_ID = ifelse(!person_ID == person_met_ID, person_met_ID, NA),
person_met_date = ifelse(!person_ID == person_met_ID, person_met_date, NA)) %>%
####
group_by(event_ID, person_ID) %>%
mutate(id_met = scales::ordinal(row_number())) %>%
ungroup() %>%
pivot_wider(names_from = id_met, values_from = c(person_met_ID, person_met_date), names_glue = "{id_met}_{.value}")
#> # A tibble: 4 × 5
#> event_ID date person_ID `1st_person_met_ID` `1st_person_met_date`
#> <int> <chr> <int> <int> <chr>
#> 1 1234 2000-08-09 5678 7890 2000-08-09
#> 2 2345 2001-09-10 6789 NA <NA>
#> 3 1234 2000-08-09 7890 5678 2000-08-09
#> 4 3456 2001-10-11 5678 NA <NA>
Upvotes: 1