Reputation: 91

Pandas for two dataframes calculation

I have two dataframes and want to do some calculation within the dataframe. But I cannot get the result I want. Please check my script.

import pandas as pd

df1 = pd.DataFrame({'A': [0.1,0.2,0.3,0.1,0.2,0.3], 'B': [7,8,9,10,11,12], 'C': ['k','k','k','j','j','j']})
df2 = pd.DataFrame({'A': [1,2,3,1,2,3], 'B': [1,2,3,4,5,6], 'C': ['k','k','k','j','j','j']})

df3=df1[df1['C']=='k']['A'].add(df2[df2['C']=='k']['A'][-1:], fill_value=0)

When I check df3, I got:

Block quote

0 0.1
1 0.2
2 3.3
Name: A, dtype: float64

Block quote

Actually, I want every data in df1 column A with column C is “k” plus the last one data in df2 column A with column C is “k”. However, now only the last one data in df1 column A did the addition and column B and C gone.

Anyone who knows how to achieve my goals? Thanks!

Upvotes: 0

Answers (3)

Sheldon

Reputation: 4633

df3 = df1[df1['C']=='k']['A']+df2[df2['C']=='k']['A'].iloc[-1:].iloc[0]

yields:

0    3.1
1    3.2
2    3.3

In the code above, I am extracting the last value in df2[df2['C']=='k']['A'], i.e. 3.0 and adding it to all rows in df1[df1['C']=='k']['A']

[EDIT] To answer your question in the comments section below, you can save the B and C columns from df1 in a separate dataframe (e.g. df4) and then concatenate them back with df3:

df4 = df1[df1['C']=='k'][['B','C']]
df5 = pd.concat([df3,df4],axis=1)

Which yields:

     A  B  C
0  3.1  7  k
1  3.2  8  k
2  3.3  9  k

The resulting df5 dataframe can then be concatenated with a dataframe containing the rows of df1 for which the value in column C is not equal to k:

df6 = df1[df1['C']!='k']
df7 = pd.concat([df5,df6])

Which yields:

     A   B  C
0  3.1   7  k
1  3.2   8  k
2  3.3   9  k
3  0.1  10  j
4  0.2  11  j
5  0.3  12  j

Upvotes: 1

YusufUMS

Reputation: 1493

Not clearly understand your goal. My understanding is that you want to calculate the values in df1 column A and the values in df2 column A, which the values in both columns are in the same position with k in column C.

df3=(df1[df1['C']=='k']['A']) + (df2[df2['C']=='k']['A'])

Output:
0    1.1
1    2.2
2    3.3
Name: A, dtype: float64

Upvotes: 0

sridhar er

Reputation: 124

import pandas as pd 

df1 = pd.DataFrame({'A': [0.1,0.2,0.3,0.1,0.2,0.3], 'B': [7,8,9,10,11,12], 'C': ['k','k','k','j','j','j']})
df2 = pd.DataFrame({'A': [1,2,3,1,2,3], 'B': [1,2,3,4,5,6], 'C': ['k','k','k','j','j','j']})

df1 = df1[df1.C == 'k'].A
df2 = df2[df2.C == 'k'].A
df1.add(df2)

The above code results in:

0    1.1
1    2.2
2    3.3
Name: A, dtype: float64

pd.concat([df1[df1['C']=='k']['A'], df2[df2['C']=='k']['A']])

The above line results in

0    0.1
1    0.2
2    0.3
0    1.0
1    2.0
2    3.0

Upvotes: 1

Pandas for two dataframes calculation

Answers (3)

Related Questions