Reputation: 2603
How to combine duplicate rows in pandas, filling in missing values?
In the example below, some rows have missing values in the c1
column, but the c2
column has duplicates that can be used as an index to look up and fill in those missing values.
the input data looks like this:
c1 c2
id
0 10.0 a
1 NaN b
2 30.0 c
3 10.0 a
4 20.0 b
5 NaN c
desired output:
c1 c2
0 10 a
1 20 b
2 30 c
But how to do this?
Here is the code to generate the example data:
import pandas as pd
df = pd.DataFrame({
'c1': [10, float('nan'), 30, 10, 20, float('nan')]
'c2': [100, 200, 300, 100, 200, 300],
})
Upvotes: 2
Views: 3939
Reputation: 862511
I think need sort_values
with drop_duplicates
:
df = df.sort_values(['c1','c2']).drop_duplicates(['c2'])
print (df)
c1 c2
0 10.0 100
4 20.0 200
2 30.0 300
Or first remove rows with NaN
s by dropna
:
df = df.dropna(subset=['c1']).drop_duplicates(['c2'])
print (df)
c1 c2
0 10.0 100
2 30.0 300
4 20.0 200
df = df.dropna(subset=['c1']).drop_duplicates(['c1','c2'])
print (df)
c1 c2
0 10.0 100
2 30.0 300
4 20.0 200
Upvotes: 2