Aly
Aly

Reputation: 367

How create summary table for every column?

I have Pandas DataFrames with around 100 columns each. I have to create a summary table for all of those columns. In the summary Dataframe I want to have a name (one from every of these data frames and this I'm doing okay) and put mean and std of every column.

So my final table should have the shape: n x m where n is the number of files and m is the number of columns x 2 (mean and std)

Something like this

name    mean_col1   std_col1    mean_col2   std_col2 
ABC        22.815293    0.103567    90.277533   0.333333
DCE        22.193991    0.12389     87.17391    0.123457

I tried following but I'm not getting what I wanted:

list_with_names = []
list_for_mean_and_std = []

for file in glob.glob("/data/path/*.csv"):
    df = pd.read_csv(file)
    
    output = {'name':df['name'][0]}
    
    list_with_names.append(output)
    
    numerical_cols = df.select_dtypes('float64')
    
    for column in numerical_cols:
        mean_col = numerical_cols[column].mean()
        std_col = numerical_cols[column].std()
        
        output_2 = {'mean': mean_col,
                    'std': std_col}
        
        list_for_mean_and_std.append(output_2)
        
    
summary = pd.DataFrame(list_with_names, list_for_mean_and_std)

And I'm getting an error Shape of passed values is (183, 1), indices imply (7874, 1) because I'm assigning in the wrong way these values with means and std but I have no idea how.

I will be glad for any advice on how to change it

Upvotes: 1

Views: 532

Answers (1)

itaishz
itaishz

Reputation: 711

Of coruse, Pandas have a method for that - describe():

df.describe()

Which gives more statistics than you requested. If you are interested only in mean and std, you can do:

df.describe()[['mean', 'std']]

Upvotes: 2

Related Questions