Faraz Ali Khan
Faraz Ali Khan

Reputation: 93

Merging DataFrames Python

I have 3 Datasets, Each with 3 columns and over 1000 rows. Data is about the count of a combination of 2 symptoms. For example, Pain and Fever with a count of 2. I want an output where these DataFrames are combined in such a way that the count of common symptoms from different DataFrames are summed up and non-common combinations are joined just the way they are. I am attaching a sample DataFrame and the result I want. Help will be appreciated.

#Sample
a = pd.DataFrame({'a':['pain','fever','Headache'],'b':['Cancer','HIV','Piles'],'count':[2,4,5]})
b = pd.DataFrame({'a':['pain','Pyrexia','Headache'],'b':['Cancer','HIV','Piles'],'count':[1,5,7]})
c = pd.DataFrame({'a':['pain','fever','Cancer'],'b':['Cancer','HIV','Piles'],'count':[2,4,5]})

This is the result I want after they have been combined:

#Result
result = pd.DataFrame({'a':['pain','fever','Headache','Pyrexia','Cancer'],\
                       'b':['Cancer','HIV','Piles','HIV','Piles'],'count':[5,8,12,5,5]})
result

    a          b    count
0   pain    Cancer      5
1   fever   HIV         8
2   Headache Piles     12
3   Pyrexia HIV         5
4   Cancer  Piles       5

Upvotes: 0

Views: 33

Answers (2)

Lukas Thaler
Lukas Thaler

Reputation: 2720

This can be achieved by using

pd.concat([a, b, c]).groupby(['a', 'b']).sum().reset_index()

where pd.concat concatenates your dataframes, groupby finds all unique combinations of your a and b columns, sum sums the third column per each of the combinations and reset_index resets the index to the default numeric one, putting the columns back into your dataframe

Upvotes: 1

forgetso
forgetso

Reputation: 2494

Concatenate them vertically (by row) and then group them with a sum.

pd.concat([a,b,c]).groupby(['a','b']).sum()

count
a           b   
Cancer      Piles   5
Headache    Piles   12
Pyrexia     HIV     5
fever       HIV     8
pain        Cancer  5

Upvotes: 1

Related Questions