Reputation: 11824
I have the following code:
import numpy as np
import pandas as pd
my_array1 = np.array([
["A",5],
["A",10],
["B",5],
["E",55],
["E",55],
])
my_array2 = np.array([
["A",22],
["B",55],
["C",22],
["D",55],
])
df1 = pd.DataFrame(my_array1, columns = ['symbol','score'])
df2 = pd.DataFrame(my_array2, columns = ['symbol','score'])
What i want is form the above two dataframes is
first sum all the values in each table w.r.t score
my_array1 = np.array([
["A",5],
["A",10],
["B",5],
["E",55],
["E",55],
])
becomes
my_array1 = np.array([
["A",15],
["B",5],
["E",110],
])
and
my_array2 = np.array([
["A",22],
["B",55],
["C",22],
["D",55],
])
becomes
my_array2 = np.array([
["A",22],
["B",55],
["C",22],
["D",55],
])
and then
get the only the common between both w.r.t column symbol and again show the total
i.e
From
my_array1 = np.array([
["A",15],
["B",5],
["E",110],
])
and
my_array2 = np.array([
["A",22],
["B",55],
["C",22],
["D",55],
])
get
my_array2 = np.array([
["A",37],
["B",60],
])
Upvotes: 1
Views: 31
Reputation: 323316
We need convert to numeric first, then groupby
with sum
df1.score = pd.to_numeric(df1.score)
df2.score = pd.to_numeric(df2.score)
com = np.intersect1d(df1.symbol,df2.symbol)
pd.concat([df1,df2]).loc[lambda x : x['symbol'].isin(com)].groupby('symbol').sum()
Out[44]:
score
symbol
A 37
B 60
The numpy array only accept one type for each sub-array , so the whole array become ovbject not numeric anymore.
Check the output
my_array1
Out[35]:
array([['A', '5'],
['A', '10'],
['B', '5'],
['E', '55'],
['E', '55']], dtype='<U2')
Upvotes: 4
Reputation: 150785
Try:
# converting to numeric
df1['score'] = df1['score'].astype(int)
df2['score'] = df2['score'].astype(int)
out1 = df1.groupby('symbol').sum()
out2 = df2.groupby('symbol').sum()
out = out1.add(out2).dropna()
Upvotes: 2