chromebookdev
chromebookdev

Reputation: 494

Merge datasets with certain priority

I have 3 datasets All the same shape CustomerNumber, Name, Status

A customer can appear on 1, 2 or all 3. Each dataset is a list of gold/silver/bronze.

example data:

Dataframe 1:
100,James,Gold

Dataframe 2:
100,James,Silver
101,Paul,Silver

Dataframe 3:
100,James,Bronze
101,Paul,Bronze
102,Fred,Bronze

Expected output/aggregated list:

100,James,Gold
101,Paul,Silver
102,Fred,Bronze

So a customer that is captured in all 3, I want to keep Status as gold.

Have been playing with join and merge and just can’t get it right.

Upvotes: 1

Views: 120

Answers (1)

jezrael
jezrael

Reputation: 862611

Use concat with convert column to ordered categorical, so get priorites if sorting values by multiple columns and last remove duplicates by DataFrame.drop_duplicates:

print (df1)
print (df2)
print (df3)
     a      b     c
0  100  James  Gold
     a      b       c
0  100  James  Silver
1  101   Paul  Silver
     a     b       c
0  101  Paul  Bronze
1  102  Fred  Bronze

df = pd.concat([df1, df2, df3], ignore_index=True)
df['c'] = pd.Categorical(df['c'], ordered=True, categories=['Gold','Silver','Bronze'])

df = df.sort_values(['a','b','c']).drop_duplicates(['a','b'])
print (df)
     a      b       c
0  100  James    Gold
2  101   Paul  Silver
4  102   Fred  Bronze

Upvotes: 1

Related Questions