Reputation: 3474
I have a dataframe with some NaNs:
hostname period Teff
51 Peg 4.2293 5773
51 Peg 4.231 NaN
51 Peg 4.23077 NaN
55 Cnc 44.3787 NaN
55 Cnc 44.373 NaN
55 Cnc 44.4175 NaN
55 Cnc NaN 5234
61 Vir NaN 5577
61 Vir 38.021 NaN
61 Vir 123.01 NaN
The rows with the same "hostname" all refer to the same object, but as you can see, some entries have NaNs under various columns. I'd like to merge all the rows under the same hostname such that I retain the first finite value in each column (drop the row if all values are NaN). So the result should look like this:
hostname period Teff
51 Peg 4.2293 5773
55 Cnc 44.3787 5234
61 Vir 38.021 5577
How would you go about doing this?
Upvotes: 4
Views: 11028
Reputation: 214927
Use groupby.first
; It takes the first non NA value:
df.groupby('hostname')[['period', 'Teff']].first().reset_index()
# hostname period Teff
#0 Cnc 44.3787 5234
#1 Peg 4.2293 5773
#2 Vir 38.0210 5577
Or manually do this with a custom aggregation function:
df.groupby('hostname')[['period', 'Teff']].agg(lambda x: x.dropna().iat[0]).reset_index()
This requires each group has at least one non NA value.
Write your own function to handle the edge case:
def first_(g):
non_na = g.dropna()
return non_na.iat[0] if len(non_na) > 0 else pd.np.nan
df.groupby('hostname')[['period', 'Teff']].agg(first_).reset_index()
# hostname period Teff
#0 Cnc 44.3787 5234
#1 Peg 4.2293 5773
#2 Vir 38.0210 5577
Upvotes: 10
Reputation: 323226
Is this what you need ?
pd.concat([ df1.apply(lambda x: sorted(x, key=pd.isnull)) for _, df1 in df.groupby('hostname')]).dropna()
Out[343]:
hostname period Teff
55 Cnc 44.3787 5234.0
51 Peg 4.2293 5773.0
61 Vir 38.0210 5577.0
Upvotes: 1