Reputation: 8366
I have a df:
df = pd.DataFrame([[1, np.nan, "filled", 3], [1, "filled", np.nan, 3], [1, "filled", np.nan, 4]], columns = ["a", "b", "c", "d"])
a b c d
0 1 NaN filled 3
1 1 filled NaN 3
2 1 filled NaN 4
And my end result should be:
df = pd.DataFrame([[1, "filled", "filled", 3], [1, "filled", np.nan, 4]], columns = ["a", "b", "c", "d"])
a b c d
0 1 filled filled 3
1 1 filled NaN 4
So I want to merge the rows that are identical in all respects other than the column b and c. The issue is that not always there will be a another row identical except for columns b and c.
Can't think how to use df.groupby(["a", "d"]).apply()
to get what I want.
Upvotes: 2
Views: 838
Reputation: 323226
You can check with groupby
+ first
, it will select the first not NaN
value as output
df.groupby(['a','d'],as_index=False).first()
Out[897]:
a d b c
0 1 3 filled filled
1 1 4 filled NaN
Upvotes: 4