Reputation: 93
I have 3 Datasets, Each with 3 columns and over 1000 rows. Data is about the count of a combination of 2 symptoms. For example, Pain and Fever with a count of 2. I want an output where these DataFrames are combined in such a way that the count of common symptoms from different DataFrames are summed up and non-common combinations are joined just the way they are. I am attaching a sample DataFrame and the result I want. Help will be appreciated.
#Sample
a = pd.DataFrame({'a':['pain','fever','Headache'],'b':['Cancer','HIV','Piles'],'count':[2,4,5]})
b = pd.DataFrame({'a':['pain','Pyrexia','Headache'],'b':['Cancer','HIV','Piles'],'count':[1,5,7]})
c = pd.DataFrame({'a':['pain','fever','Cancer'],'b':['Cancer','HIV','Piles'],'count':[2,4,5]})
This is the result I want after they have been combined:
#Result
result = pd.DataFrame({'a':['pain','fever','Headache','Pyrexia','Cancer'],\
'b':['Cancer','HIV','Piles','HIV','Piles'],'count':[5,8,12,5,5]})
result
a b count
0 pain Cancer 5
1 fever HIV 8
2 Headache Piles 12
3 Pyrexia HIV 5
4 Cancer Piles 5
Upvotes: 0
Views: 33
Reputation: 2720
This can be achieved by using
pd.concat([a, b, c]).groupby(['a', 'b']).sum().reset_index()
where pd.concat
concatenates your dataframes, groupby
finds all unique combinations of your a
and b
columns, sum
sums the third column per each of the combinations and reset_index
resets the index to the default numeric one, putting the columns back into your dataframe
Upvotes: 1
Reputation: 2494
Concatenate them vertically (by row) and then group them with a sum.
pd.concat([a,b,c]).groupby(['a','b']).sum()
count
a b
Cancer Piles 5
Headache Piles 12
Pyrexia HIV 5
fever HIV 8
pain Cancer 5
Upvotes: 1