Alex Trevylan
Alex Trevylan

Reputation: 545

How do I average two data frames in pandas?

I have two data frames that look like this:

c453    1   8268    loc.00001   .   +   1  2  3 5  6 
c453    1   8268    loc.00002   .   +   1  1  4 5  8 
c453    1   8268    loc.00003   .   +   1  1  4 9  8 


c453    1   8268    loc.00001   .   +   2  3  3 2  6 
c453    1   8268    loc.00002   .   +   1  1  4 5  4 
c453    1   8268    loc.00003   .   +   1  1  4 9  8 

I want to average across these two files to give the following output:

c453    1   8268    loc.00001   .   +   1.5  2.5  3 3.5  6 
c453    1   8268    loc.00002   .   +   1  1  4 5  6 
c453    1   8268    loc.00003   .   +   1  1  4 9  8 

I would like to do this in pandas, as I am new to using pandas. Concatenating the two files I can do, but stuck from there. Any help would be useful.

Upvotes: 2

Views: 3628

Answers (2)

jezrael
jezrael

Reputation: 862681

I think need concat with aggregate mean, if first 4 columns, what is necessary if duplicates rows in first 4 columns in df1 or df2:

df = pd.concat([df1, df2]).groupby(df.columns.tolist()[:4]).mean()

If not, use set_index with add and divide by 2:

a = df1.set_index(df.columns.tolist()[:4])
b = df1.set_index(df.columns.tolist()[:4])
c = a.add(b).div(2).reset_index()

Upvotes: 2

John Zwinck
John Zwinck

Reputation: 249163

Don't concatenate, that just confuses things. The average of two values is the sum divided by two:

df1.set_index(['A', 'B', 'C'], inplace=True)
df2.set_index(['A', 'B', 'C'], inplace=True)
avg = (df1[['D', 'E']] + df2[['D', 'E']]) / 2

That assumes the "key" columns are A/B/C and the data columns are D/E.

Upvotes: 3

Related Questions