Jagruth
Jagruth

Reputation: 143

Adding values to all rows of dataframe

I have two pandas dataframes df1 (of length 2) and df2 (of length about 30 rows). Index values of df1 are always different and never occur in df2. I would like to add the average of columns from df1 to corresponding columns of df2. Example: add 0.6 to all rows of c1 and 0.9 to all rows of c2 etc ...

df1: 
  Date       c1   c2   c3   c4    c5   c6 ...  c10
2017-09-10  0.5  0.6  1.2   0.7  1.3  1.8 ...  1.3
2017-09-11  0.7  1.2  1.3   0.4  0.7  0.4 ...  1.5


df2:
  Date       c1   c2   c3   c4    c5   c6 ...  c10
2017-09-12  0.9  0.1  1.4   0.9  1.5  1.9 ...  1.9
2017-09-13  0.2  1.8  1.2   1.4  2.7  0.8 ...  1.1
    :                                  :  
    :                                  :     
2017-10-10  1.5  0.9  1.5   0.9  1.6  1.8 ...  1.7
2017-10-11  2.7  1.1  1.9   0.4  0.8  0.8 ...  1.3

How can I do that ?

Upvotes: 2

Views: 19783

Answers (4)

pankaj mishra
pankaj mishra

Reputation: 2615

This could be another way of doing it , just simplified this a little bit

import pandas as pd
import numpy as np
from datetime import datetime, timedelta 
date_today=datetime.now()

#Creating df1 & df2 
df1=pd.DataFrame(
    {
        'Date':[date_today,date_today],
        'c1':[0.5,0.4],
        'c2':[0.6,0.3]
    }
)
df2=pd.DataFrame(
    {
        'Date':[date_today,date_today,date_today],
        'c1':[0.9,0.7,0.6],
        'c2':[0.8,0.4,0.3]
    }
)


#getting average of column c1
avg=df1["c1"].mean()

#Adding the average to your existing column of df2
df2['c1']+avg

Upvotes: 0

piRSquared
piRSquared

Reputation: 294218

When using mean on df1, it calculates over each column by default and produces a pd.Series.

When adding adding a pd.Series to a pd.DataFrame it aligns the index of the pd.Series with the columns of the pd.DataFrame and broadcasts along the index of the pd.DataFrame... by default.

The only tricky bit is handling the Date column.

Option 1

m = df1.mean()
df2.loc[:, m.index] += m

df2

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

If I know that 'Date' is always in the first column, I can:

df2.iloc[:, 1:] += df1.mean()
df2

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

Option 2
Notice that I use the append=True parameter in the set_index just incase there are things in the index you don't want to mess up.

df2.set_index('Date', append=True).add(df1.mean()).reset_index('Date')

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

If you don't care about the index, you can shorten this to

df2.set_index('Date').add(df1.mean()).reset_index()

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

Upvotes: 4

chumbak
chumbak

Reputation: 21

There is probably a more efficient way but here is a quick and dirty solution. I hope this helps!

d = {'c1': [0.5,0.7], 'c2': [0.6,1.2],'c3': [1.2,1.3]}
df1 = pd.DataFrame(data=d, index=['2017-09-10','2017-09-11'])
df2 = pd.DataFrame(data=d, index=['2017-09-12','2017-09-13'])

df1

      Date   c1 c2  c3
2017-09-10  0.5 0.6 1.2
2017-09-11  0.7 1.2 1.3

df2

Date   c1   c2  c3
2017-09-12  0.5 0.6 1.2
2017-09-13  0.7 1.2 1.3

The averages of each column in df1 can be obtained using the describe() function

df1.describe().ix['mean']

c1    0.60
c2    0.90
c3    1.25

And now, simply add the series to df2

df2 + df1.describe().ix['mean']

Date     c1 c2  c3
2017-09-12  1.1 1.5 2.45
2017-09-13  1.3 2.1 2.55

Upvotes: 1

jondo
jondo

Reputation: 21

If all columns are in both data frames, then just

for col in df2.columns:
    df2[col] = df2[col] + df1[col].mean()

if the columns are not necessarily in both then:

for col in df2.columns:
    if col in df1.columns:
        df2[col] = df2[col] + df1[col].mean()

Upvotes: 1

Related Questions