Reputation: 545
I have two data frames that look like this:
c453 1 8268 loc.00001 . + 1 2 3 5 6
c453 1 8268 loc.00002 . + 1 1 4 5 8
c453 1 8268 loc.00003 . + 1 1 4 9 8
c453 1 8268 loc.00001 . + 2 3 3 2 6
c453 1 8268 loc.00002 . + 1 1 4 5 4
c453 1 8268 loc.00003 . + 1 1 4 9 8
I want to average across these two files to give the following output:
c453 1 8268 loc.00001 . + 1.5 2.5 3 3.5 6
c453 1 8268 loc.00002 . + 1 1 4 5 6
c453 1 8268 loc.00003 . + 1 1 4 9 8
I would like to do this in pandas, as I am new to using pandas. Concatenating the two files I can do, but stuck from there. Any help would be useful.
Upvotes: 2
Views: 3628
Reputation: 862681
I think need concat
with aggregate mean
, if first 4 columns, what is necessary if duplicates rows in first 4 columns in df1
or df2
:
df = pd.concat([df1, df2]).groupby(df.columns.tolist()[:4]).mean()
If not, use set_index
with add
and divide by 2
:
a = df1.set_index(df.columns.tolist()[:4])
b = df1.set_index(df.columns.tolist()[:4])
c = a.add(b).div(2).reset_index()
Upvotes: 2
Reputation: 249163
Don't concatenate, that just confuses things. The average of two values is the sum divided by two:
df1.set_index(['A', 'B', 'C'], inplace=True)
df2.set_index(['A', 'B', 'C'], inplace=True)
avg = (df1[['D', 'E']] + df2[['D', 'E']]) / 2
That assumes the "key" columns are A/B/C and the data columns are D/E.
Upvotes: 3