dnclem
dnclem

Reputation: 2838

pandas merge rows with same value in one column

I have a pandas dataframe in which one particular column (ID) can have 1, 2, or 3 entries in another column (Number), like this:

    ID             Address        Number
120004   3188 James Street   123-456-789
120004   3188 James Street   111-123-134
120004   100 XYZ Avenue      001-002-003
321002   500 ABC Street      444-222-666
321002   500 ABC Street      323-123-423
321003   800 ABC Street      100-200-300

What I need to do is merge rows with the same ID into a single row, keep only the first address, and fill in the additional columns for any additional "Numbers" if necessary, like so:

    ID             Address        Number1      Number2       Number3
120004   3188 James Street   123-456-789   111-123-134   001-002-003    
321002   500 ABC Street      444-222-666   323-123-423             -
321003   800 ABC Street      100-200-300             -             -

How would I do this? What I did was generate a new dataframe with only the ID and Numbers:

dx = df.set_index(['ID', df.groupby('ID')
.cumcount()])['Number']
.unstack()
.add_prefix('Number')
.reset_index()

And then combining this modified dataframe with the original dataframe, and dropping duplicates/keeping the first index only, but I am wondering if this is correct and if there is a more efficient way.

Upvotes: 0

Views: 7778

Answers (2)

Anurag Dabas
Anurag Dabas

Reputation: 24304

you can try via groupby() and agg():

out=df.groupby('ID',as_index=False).agg({'Number':list,'Address':'first'})
out=out.join(pd.DataFrame(out.pop('Number').tolist()).rename(columns=lambda x:f"Number{x+1}"))

output of out:

    ID      Address             Number1         Number2         Number3
0   120004  3188 James Street   123-456-789     111-123-134     001-002-003
1   321002  500 ABC Street      444-222-666     323-123-423     None
2   321003  800 ABC Street      100-200-300     None            None

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19947

You can first use groupby to flatten up the Numbers and then rename the columns. Finally, create the Address column by taking the first address from each group.

(
    df.groupby('ID')
    .apply(lambda x: x.Number.tolist())
    .apply(pd.Series)
    .rename(lambda x: f'Number{int(x)+1}', axis=1)
    .assign(Address=df.groupby('ID').Address.first())
)

Upvotes: 1

Related Questions