Reputation: 1471
I've read quite a few different methods on joining and still haven't really found a solution that I can wrap my head around. Was hoping for some input or guidance.
I have a dataframe with a set of columns that looks like the following:
In [1]: df_old
Out[1]:
CID time_a time_b time_c time_d
dc12 4:14pm NaN NaN NaN
dc12 NaN 4:18pm NaN NaN
dc12 NaN NaN 4:44pm NaN
ab14 2:14pm NaN NaN NaN
ab14 NaN 3:18pm NaN NaN
ab14 NaN NaN 3:27pm NaN
ab14 NaN NaN NaN 4:15pm
What I want would be the following:
In [2]: df_new
Out[2]:
CID time_a time_b time_c time_d
dc12 4:14pm 4:18pm 4:44pm NaN
ab14 2:14pm 3:18pm 3:27pm 4:15pm
...
I think there's a method of doing it with df.groupby() but I wasn't able to get any results and was wondering if anybody could point me in the right direction.
Thanks so much in advance for your help!
Upvotes: 0
Views: 178
Reputation: 352989
You could use groupby
and then call .first()
, which will give you the first non-nan value seen (which is why I was wondering whether there was only one):
>>> df.groupby("CID", as_index=False).first()
CID time_a time_b time_c time_d
0 ab14 2:14pm 3:18pm 3:27pm 4:15pm
1 dc12 4:14pm 4:18pm 4:44pm NaN
>>> df.groupby("CID", as_index=False, sort=False).first()
CID time_a time_b time_c time_d
0 dc12 4:14pm 4:18pm 4:44pm NaN
1 ab14 2:14pm 3:18pm 3:27pm 4:15pm
This assumes CID is a column and not an index. If it's an index, either call reset_index
or use df.groupby(level=0).first()
instead.
Upvotes: 2