Reputation: 367
I have Pandas DataFrames with around 100 columns each. I have to create a summary table for all of those columns. In the summary Dataframe I want to have a name (one from every of these data frames and this I'm doing okay) and put mean and std of every column.
So my final table should have the shape: n x m where n is the number of files and m is the number of columns x 2 (mean and std)
Something like this
name mean_col1 std_col1 mean_col2 std_col2
ABC 22.815293 0.103567 90.277533 0.333333
DCE 22.193991 0.12389 87.17391 0.123457
I tried following but I'm not getting what I wanted:
list_with_names = []
list_for_mean_and_std = []
for file in glob.glob("/data/path/*.csv"):
df = pd.read_csv(file)
output = {'name':df['name'][0]}
list_with_names.append(output)
numerical_cols = df.select_dtypes('float64')
for column in numerical_cols:
mean_col = numerical_cols[column].mean()
std_col = numerical_cols[column].std()
output_2 = {'mean': mean_col,
'std': std_col}
list_for_mean_and_std.append(output_2)
summary = pd.DataFrame(list_with_names, list_for_mean_and_std)
And I'm getting an error Shape of passed values is (183, 1), indices imply (7874, 1)
because I'm assigning in the wrong way these values with means and std but I have no idea how.
I will be glad for any advice on how to change it
Upvotes: 1
Views: 532
Reputation: 711
Of coruse, Pandas have a method for that - describe()
:
df.describe()
Which gives more statistics than you requested. If you are interested only in mean and std, you can do:
df.describe()[['mean', 'std']]
Upvotes: 2