Pandas create a new column based on non-unique column matches, concatenating strings

Question

I have a data frame where one objectid maps to one collisionid (not unique)

df1

    objectid    collisionid  
0   3000        0002  
1   4000        0001  
2   5000        0002

I have another data frame containing all possible collision ids:

df2

    collisionid   
0   0001           
1   0002   
2   0003

I want to update df2 to contain a row of all the object ids that map to that collision:

df2 - DESIRED

    collisionid    objectids  
0   0001           4000  
1   0002           3000,5000  
2   0003

(in this case the objectids are stored as strings).

I have attempted doing something like this:

df2['objectids'] = df2['collisionid'].map(df1.set_index('collisionid')['objectid'])

but I get an error Reindexing only valid with uniquely valued Index objects. How do I go about doing this and also trying to merge the strings?

cs95 · Accepted Answer

Start by merging them, then group and aggregate:

(df2.merge(df1, how='left', on='collisionid')
    .fillna('')
    .groupby('collisionid', as_index=False)
    .agg(','.join))

  collisionid   objectid
0        0001       4000
1        0002  3000,5000
2        0003

For a specific column, your agg syntax changes a bit:

(df2.merge(df1, how='left', on='collisionid')
    .fillna('')
    .groupby('collisionid')
    .agg(objectid=('objectid', ','.join)) # agg(output_name=(input_name, aggfunc))
    .reset_index())

  collisionid   objectid
0        0001       4000
1        0002  3000,5000
2        0003

Pandas create a new column based on non-unique column matches, concatenating strings

Answers (1)

Related Questions