mburke05
mburke05

Reputation: 1471

Pandas Groupby Row with Multiple Columns

I've read quite a few different methods on joining and still haven't really found a solution that I can wrap my head around. Was hoping for some input or guidance.

I have a dataframe with a set of columns that looks like the following:

In [1]: df_old
Out[1]: 
CID     time_a     time_b     time_c     time_d
dc12    4:14pm     NaN        NaN        NaN
dc12    NaN        4:18pm     NaN        NaN
dc12    NaN        NaN        4:44pm     NaN
ab14    2:14pm     NaN        NaN        NaN
ab14    NaN        3:18pm     NaN        NaN
ab14    NaN        NaN        3:27pm     NaN
ab14    NaN        NaN        NaN        4:15pm

What I want would be the following:

In [2]: df_new
Out[2]: 

CID     time_a     time_b     time_c     time_d
dc12    4:14pm     4:18pm     4:44pm     NaN
ab14    2:14pm     3:18pm     3:27pm     4:15pm
...

I think there's a method of doing it with df.groupby() but I wasn't able to get any results and was wondering if anybody could point me in the right direction.

Thanks so much in advance for your help!

Upvotes: 0

Views: 178

Answers (1)

DSM
DSM

Reputation: 352989

You could use groupby and then call .first(), which will give you the first non-nan value seen (which is why I was wondering whether there was only one):

>>> df.groupby("CID", as_index=False).first()
    CID  time_a  time_b  time_c  time_d
0  ab14  2:14pm  3:18pm  3:27pm  4:15pm
1  dc12  4:14pm  4:18pm  4:44pm     NaN
>>> df.groupby("CID", as_index=False, sort=False).first()
    CID  time_a  time_b  time_c  time_d
0  dc12  4:14pm  4:18pm  4:44pm     NaN
1  ab14  2:14pm  3:18pm  3:27pm  4:15pm

This assumes CID is a column and not an index. If it's an index, either call reset_index or use df.groupby(level=0).first() instead.

Upvotes: 2

Related Questions