Cameron
Cameron

Reputation: 101

Pandas flatten rows where possible

I have a dataframe that outputs multiple rows for some IDs based on how much info I have on each, and I want to flatten it where possible. An example of my dataframe is below.

id       address         city       state  country  company  title   email
001      80 Fake St.     San F.     CA     USA      null     null    [email protected]
001      null            null       null   null     Amazon   CEO     null                              
002      2 Test Ave.     San F.     CA     USA      null     null    [email protected]
002      null            null       null   null     Google   CEO     [email protected]

I would like for the rows for 001 to be combined into one, since all values can fit into the same row. For 002, it is ok for them to be on separate rows, since there needs to be two rows anyways to store both emails.

I've tried

df.groupby('id', squeeze=True) 

which gets me close to what I want, but for 002 it duplicates all data from the first row and puts it on the second. I would prefer that those stay as null, so that the same data isn't being processed twice once I receive the final output. So my desired final output is:

id       address         city       state  country  company  title   email
001      80 Fake St.     San F.     CA     USA      Amazon   CEO    [email protected]           
002      2 Test Ave.     San F.     CA     USA      null     null    [email protected]
002      null            null       null   null     Google   CEO     [email protected]   

I never know which columns will be able to be merged/flattened, so I can't drop duplicates based on certain columns (I don't think). Any suggestions on how to achieve this output would be great!

Upvotes: 2

Views: 143

Answers (1)

BENY
BENY

Reputation: 323306

Here I am using sorted , since it is more easy to read

yourdf=df.groupby('id').apply(lambda x : x.apply(lambda y: sorted(y,key=pd.isnull))).dropna(thresh=2) 
yourdf
   id    address   city state country company title                email
0   1  80FakeSt.  SanF.    CA     USA  Amazon   CEO   [email protected]
2   2  2TestAve.   SaF.    CA     USA  Google   CEO         [email protected]
3   2        NaN    NaN   NaN     NaN     NaN   NaN  [email protected]

If you need the speed , check justify

Upvotes: 1

Related Questions