How to summarise one set of id values in a dataframe grouped by another set of ids

Question

Having gotten my data into the format:

    pId   fId   
1   1     0     
2   1     108   
3   1     940   
4   1     972   
5   1     993   
6   2     0     
7   3     0     
8   3     32    
9   3     108   
10  3     176

My goal is to try and (for a much longer set of data) determine which fIds each pId has in common with each other, and from that how many they have in common. My plan was to try and summarise into singular rows of pId where each fId is a list of fIds, and then loop a function like intersect() or of similar nature across that, for an ideal ouput of format:

   pId1   pId2  together
1   1     2     1
2   1     3     2
3   1     4     N
4   2     3     1

etc....

EDIT: trying to work with the data in one of these ways

   pId  allfId                          allfIdSplit
1   1   0,901,940,972,993               c("0", "901", "940", "972", "993")
2   2   0                               0
3   3   0,32,108,176                    c("0", "32", "108", "176")
4   4   0,200,561,602,629,772,825,991   c("0", "200", "561", "602", "629", "772", "825", "991")
5   5   0                               0

With code that I had so far, where df_a is startng point as shown above to give the output shown in the edit:

df_c <- df_a %>%
  group_by(pId) %>%
  arrange(pId) %>%
  summarize(allFlights = paste(unique(flightId), collapse = ",")) %>%
  mutate(allFlightsSplit = str_split(allFlights, ",")) %>%
  print()

How to summarise one set of id values in a dataframe grouped by another set of ids

Answers (1)

An alternative using loops

Related Questions