pandas merge rows with same value in one column

Question

I have a pandas dataframe in which one particular column (ID) can have 1, 2, or 3 entries in another column (Number), like this:

    ID             Address        Number
120004   3188 James Street   123-456-789
120004   3188 James Street   111-123-134
120004   100 XYZ Avenue      001-002-003
321002   500 ABC Street      444-222-666
321002   500 ABC Street      323-123-423
321003   800 ABC Street      100-200-300

What I need to do is merge rows with the same ID into a single row, keep only the first address, and fill in the additional columns for any additional "Numbers" if necessary, like so:

    ID             Address        Number1      Number2       Number3
120004   3188 James Street   123-456-789   111-123-134   001-002-003    
321002   500 ABC Street      444-222-666   323-123-423             -
321003   800 ABC Street      100-200-300             -             -

How would I do this? What I did was generate a new dataframe with only the ID and Numbers:

dx = df.set_index(['ID', df.groupby('ID')
.cumcount()])['Number']
.unstack()
.add_prefix('Number')
.reset_index()

And then combining this modified dataframe with the original dataframe, and dropping duplicates/keeping the first index only, but I am wondering if this is correct and if there is a more efficient way.

Allen Qin · Accepted Answer

You can first use groupby to flatten up the Numbers and then rename the columns. Finally, create the Address column by taking the first address from each group.

(
    df.groupby('ID')
    .apply(lambda x: x.Number.tolist())
    .apply(pd.Series)
    .rename(lambda x: f'Number{int(x)+1}', axis=1)
    .assign(Address=df.groupby('ID').Address.first())
)

pandas merge rows with same value in one column

Answers (2)

Related Questions