Louic
Louic

Reputation: 2603

How to combine duplicate rows in pandas?

How to combine duplicate rows in pandas, filling in missing values?

In the example below, some rows have missing values in the c1 column, but the c2 column has duplicates that can be used as an index to look up and fill in those missing values.

the input data looks like this:

    c1  c2
id      
0   10.0    a
1   NaN     b
2   30.0    c
3   10.0    a
4   20.0    b
5   NaN     c

desired output:

    c1  c2
0   10  a
1   20  b
2   30  c

But how to do this?

Here is the code to generate the example data:

import pandas as pd
df = pd.DataFrame({
    'c1': [10, float('nan'), 30, 10, 20, float('nan')]
    'c2': [100, 200, 300, 100, 200, 300],
})

Upvotes: 2

Views: 3939

Answers (1)

jezrael
jezrael

Reputation: 862511

I think need sort_values with drop_duplicates:

df = df.sort_values(['c1','c2']).drop_duplicates(['c2'])
print (df)
     c1   c2
0  10.0  100
4  20.0  200
2  30.0  300

Or first remove rows with NaNs by dropna:

df = df.dropna(subset=['c1']).drop_duplicates(['c2'])
print (df)
     c1   c2
0  10.0  100
2  30.0  300
4  20.0  200

df = df.dropna(subset=['c1']).drop_duplicates(['c1','c2'])
print (df)
     c1   c2
0  10.0  100
2  30.0  300
4  20.0  200

Upvotes: 2

Related Questions