Reputation: 2838
I have a pandas dataframe in which one particular column (ID) can have 1, 2, or 3 entries in another column (Number), like this:
ID Address Number
120004 3188 James Street 123-456-789
120004 3188 James Street 111-123-134
120004 100 XYZ Avenue 001-002-003
321002 500 ABC Street 444-222-666
321002 500 ABC Street 323-123-423
321003 800 ABC Street 100-200-300
What I need to do is merge rows with the same ID into a single row, keep only the first address, and fill in the additional columns for any additional "Numbers" if necessary, like so:
ID Address Number1 Number2 Number3
120004 3188 James Street 123-456-789 111-123-134 001-002-003
321002 500 ABC Street 444-222-666 323-123-423 -
321003 800 ABC Street 100-200-300 - -
How would I do this? What I did was generate a new dataframe with only the ID and Numbers:
dx = df.set_index(['ID', df.groupby('ID')
.cumcount()])['Number']
.unstack()
.add_prefix('Number')
.reset_index()
And then combining this modified dataframe with the original dataframe, and dropping duplicates/keeping the first index only, but I am wondering if this is correct and if there is a more efficient way.
Upvotes: 0
Views: 7778
Reputation: 24304
you can try via groupby()
and agg()
:
out=df.groupby('ID',as_index=False).agg({'Number':list,'Address':'first'})
out=out.join(pd.DataFrame(out.pop('Number').tolist()).rename(columns=lambda x:f"Number{x+1}"))
output of out
:
ID Address Number1 Number2 Number3
0 120004 3188 James Street 123-456-789 111-123-134 001-002-003
1 321002 500 ABC Street 444-222-666 323-123-423 None
2 321003 800 ABC Street 100-200-300 None None
Upvotes: 1
Reputation: 19947
You can first use groupby
to flatten up the Numbers
and then rename the columns. Finally, create the Address
column by taking the first address from each group.
(
df.groupby('ID')
.apply(lambda x: x.Number.tolist())
.apply(pd.Series)
.rename(lambda x: f'Number{int(x)+1}', axis=1)
.assign(Address=df.groupby('ID').Address.first())
)
Upvotes: 1