Pandas flatten rows where possible

Question

I have a dataframe that outputs multiple rows for some IDs based on how much info I have on each, and I want to flatten it where possible. An example of my dataframe is below.

id       address         city       state  country  company  title   email
001      80 Fake St.     San F.     CA     USA      null     null    personal@gmail.com
001      null            null       null   null     Amazon   CEO     null                              
002      2 Test Ave.     San F.     CA     USA      null     null    test@aol.com
002      null            null       null   null     Google   CEO     googleceo@gmail.com

I would like for the rows for 001 to be combined into one, since all values can fit into the same row. For 002, it is ok for them to be on separate rows, since there needs to be two rows anyways to store both emails.

I've tried

df.groupby('id', squeeze=True)

which gets me close to what I want, but for 002 it duplicates all data from the first row and puts it on the second. I would prefer that those stay as null, so that the same data isn't being processed twice once I receive the final output. So my desired final output is:

id       address         city       state  country  company  title   email
001      80 Fake St.     San F.     CA     USA      Amazon   CEO    personal@gmail.com           
002      2 Test Ave.     San F.     CA     USA      null     null    test@aol.com
002      null            null       null   null     Google   CEO     googleceo@gmail.com

I never know which columns will be able to be merged/flattened, so I can't drop duplicates based on certain columns (I don't think). Any suggestions on how to achieve this output would be great!

BENY · Accepted Answer

Here I am using sorted , since it is more easy to read

yourdf=df.groupby('id').apply(lambda x : x.apply(lambda y: sorted(y,key=pd.isnull))).dropna(thresh=2) 
yourdf
   id    address   city state country company title                email
0   1  80FakeSt.  SanF.    CA     USA  Amazon   CEO   personal@gmail.com
2   2  2TestAve.   SaF.    CA     USA  Google   CEO         test@aol.com
3   2        NaN    NaN   NaN     NaN     NaN   NaN  googleceo@gmail.com

If you need the speed , check justify

Pandas flatten rows where possible

Answers (1)

Related Questions