steve
steve

Reputation: 531

Python calculate median value of different dataframes

I have a folder with different subfolder containing each a table merged.txt having the same frame but some differences in the header names as follow:

../a/merged.txt:

stat,a_a,b_a,c_a,d_a
std,1,2,3,4

../b/merged.txt:

stat,a_b,b_b,c_b,d_b
std,2,3,4,5

I would like to output a table containing the median values of each rows outside of the header and row names, as follow:

stat,a,b,c,d
std,1.5,2.5,3.5,4.5

Does anyone knows how to do that? Thanks

Upvotes: 2

Views: 175

Answers (2)

mh70
mh70

Reputation: 21

import pandas as pd

df_a = pd.read_csv('./a/merged.txt')
df_b = pd.read_csv('./b/merged.txt')

column_names = ["stat","a","b","c","d"]

df_a.columns = column_names
df_b.columns = column_names

df_combined = pd.concat([df_a, df_b])
med = df_combined.median()

df_out = pd.DataFrame(columns = column_names)
df_out.at[0,"stat"] = "std"
for c in column_names[1:]:
    df_out.loc[0,c] = med[c]

print(df_out.to_csv(index=False))

...
I like @jpp 's solution more than mine...

Upvotes: 1

jpp
jpp

Reputation: 164623

Here is one way using pandas and numpy.

import pandas as pd
from io import StringIO

str1 = StringIO("""
stat,a_a,b_a,c_a,d_a
std,1,2,3,4""")

str2 = StringIO("""
stat,a_b,b_b,c_b,d_b
std,2,3,4,5""")

# replace str1 & str2 with 'file1.csv' and 'file2.csv'
df1 = pd.read_csv(str1)
df2 = pd.read_csv(str2)

df = pd.DataFrame(np.median([df1.iloc[:, 1:].values, df2.iloc[:, 1:].values], axis=0),
                  columns=list('abcd')).assign(stat=df1['stat'])

df = df[['stat', 'a', 'b', 'c', 'd']]

df.to_csv('file.csv', index=False)

#   stat    a    b    c    d
# 0  std  1.5  2.5  3.5  4.5

Upvotes: 1

Related Questions