Reputation: 531
I have a folder with different subfolder containing each a table merged.txt
having the same frame but some differences in the header names as follow:
../a/merged.txt
:
stat,a_a,b_a,c_a,d_a
std,1,2,3,4
../b/merged.txt
:
stat,a_b,b_b,c_b,d_b
std,2,3,4,5
I would like to output a table containing the median values of each rows outside of the header and row names, as follow:
stat,a,b,c,d
std,1.5,2.5,3.5,4.5
Does anyone knows how to do that? Thanks
Upvotes: 2
Views: 175
Reputation: 21
import pandas as pd
df_a = pd.read_csv('./a/merged.txt')
df_b = pd.read_csv('./b/merged.txt')
column_names = ["stat","a","b","c","d"]
df_a.columns = column_names
df_b.columns = column_names
df_combined = pd.concat([df_a, df_b])
med = df_combined.median()
df_out = pd.DataFrame(columns = column_names)
df_out.at[0,"stat"] = "std"
for c in column_names[1:]:
df_out.loc[0,c] = med[c]
print(df_out.to_csv(index=False))
...
I like @jpp 's solution more than mine...
Upvotes: 1
Reputation: 164623
Here is one way using pandas
and numpy
.
import pandas as pd
from io import StringIO
str1 = StringIO("""
stat,a_a,b_a,c_a,d_a
std,1,2,3,4""")
str2 = StringIO("""
stat,a_b,b_b,c_b,d_b
std,2,3,4,5""")
# replace str1 & str2 with 'file1.csv' and 'file2.csv'
df1 = pd.read_csv(str1)
df2 = pd.read_csv(str2)
df = pd.DataFrame(np.median([df1.iloc[:, 1:].values, df2.iloc[:, 1:].values], axis=0),
columns=list('abcd')).assign(stat=df1['stat'])
df = df[['stat', 'a', 'b', 'c', 'd']]
df.to_csv('file.csv', index=False)
# stat a b c d
# 0 std 1.5 2.5 3.5 4.5
Upvotes: 1