Santhosh
Santhosh

Reputation: 11824

pandas: get only common rows w.r.t a column between two dataframes and sum up the values in another column

I have the following code:

import numpy as np

import pandas as pd

my_array1 = np.array([
["A",5],
["A",10],
["B",5],
["E",55],
["E",55],
])

my_array2 = np.array([
["A",22],
["B",55],
["C",22],
["D",55],
])


df1 = pd.DataFrame(my_array1, columns = ['symbol','score'])
df2 = pd.DataFrame(my_array2, columns = ['symbol','score'])

What i want is form the above two dataframes is

first sum all the values in each table w.r.t score

my_array1 = np.array([
["A",5],
["A",10],
["B",5],
["E",55],
["E",55],
])

becomes

my_array1 = np.array([
["A",15],
["B",5],
["E",110],
])

and

my_array2 = np.array([
["A",22],
["B",55],
["C",22],
["D",55],
])

becomes


my_array2 = np.array([
["A",22],
["B",55],
["C",22],
["D",55],
])

and then

get the only the common between both w.r.t column symbol and again show the total

i.e

From

my_array1 = np.array([
["A",15],
["B",5],
["E",110],
])


and 

my_array2 = np.array([
["A",22],
["B",55],
["C",22],
["D",55],
])

get


my_array2 = np.array([
["A",37],
["B",60],
])

Upvotes: 1

Views: 31

Answers (2)

BENY
BENY

Reputation: 323316

We need convert to numeric first, then groupby with sum

df1.score = pd.to_numeric(df1.score)
df2.score = pd.to_numeric(df2.score)
com = np.intersect1d(df1.symbol,df2.symbol)
pd.concat([df1,df2]).loc[lambda x : x['symbol'].isin(com)].groupby('symbol').sum()
Out[44]: 
        score
symbol       
A          37
B          60

The numpy array only accept one type for each sub-array , so the whole array become ovbject not numeric anymore.

Check the output

my_array1
Out[35]: 
array([['A', '5'],
       ['A', '10'],
       ['B', '5'],
       ['E', '55'],
       ['E', '55']], dtype='<U2')

Upvotes: 4

Quang Hoang
Quang Hoang

Reputation: 150785

Try:

# converting to numeric
df1['score'] = df1['score'].astype(int)
df2['score'] = df2['score'].astype(int)

out1 = df1.groupby('symbol').sum()
out2 = df2.groupby('symbol').sum()

out = out1.add(out2).dropna()

Upvotes: 2

Related Questions