Igor
Igor

Reputation: 55

Calculating mean values in dataframe and adding to new columns

I need to calculate average values (row wise without index) of columns with constant step.

I have already done a simple operation for the first 4 columns. It works nicely. After that I have created a list with column names (for storing average values) for dataframe. I have found out that I can do this using apply and lambda. I have tried many variants to get a result, but I have not found a solution.

data= np.arange(400).reshape(20,20)
df=pd.DataFrame(data=data)
df.columns=['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T']
df['A1_avg'] = df[['A', 'B', 'C', 'D']].mean(axis=1)
colnames_avg=['A1_avg', 'A2_avg', 'A3_avg', 'A4_avg', 'A5_avg']
df.head()

I have tried this code for generating 5 extra columns containing the average of several subsets of data:

 df[colnames_avg]=df[colnames_avg].applymap(lambda x: df[['A', 'B', 'C', 'D'], ['E', 'F', 'G', 'H'], ['I', 'J', 'K', 'L'],['M', 'N', 'O', 'P'],['Q', 'R', 'S', 'T']].mean(axis=1)

Is it possible to do this with the range function with a predefined step (e.g. 4)?

Upvotes: 3

Views: 896

Answers (2)

Erfan
Erfan

Reputation: 42896

Method 1: numpy.split & DataFrame.loc:

We can split your columns into evenly size chunks and then use .loc to create the new columns:

for idx, chunk in enumerate(np.split(df.columns, len(df.columns)/4)):
    df[f'A{idx+1}_avg'] = df.loc[:, chunk].mean(axis=1)

Output

      A    B    C    D    E    F    G    H    I    J  ...    P    Q    R    S    T  A1_avg  A2_avg  A3_avg  A4_avg  A5_avg
0     0    1    2    3    4    5    6    7    8    9  ...   15   16   17   18   19     1.5     5.5     9.5    13.5    17.5
1    20   21   22   23   24   25   26   27   28   29  ...   35   36   37   38   39    21.5    25.5    29.5    33.5    37.5
2    40   41   42   43   44   45   46   47   48   49  ...   55   56   57   58   59    41.5    45.5    49.5    53.5    57.5
3    60   61   62   63   64   65   66   67   68   69  ...   75   76   77   78   79    61.5    65.5    69.5    73.5    77.5
4    80   81   82   83   84   85   86   87   88   89  ...   95   96   97   98   99    81.5    85.5    89.5    93.5    97.5
5   100  101  102  103  104  105  106  107  108  109  ...  115  116  117  118  119   101.5   105.5   109.5   113.5   117.5
6   120  121  122  123  124  125  126  127  128  129  ...  135  136  137  138  139   121.5   125.5   129.5   133.5   137.5
7   140  141  142  143  144  145  146  147  148  149  ...  155  156  157  158  159   141.5   145.5   149.5   153.5   157.5
8   160  161  162  163  164  165  166  167  168  169  ...  175  176  177  178  179   161.5   165.5   169.5   173.5   177.5
9   180  181  182  183  184  185  186  187  188  189  ...  195  196  197  198  199   181.5   185.5   189.5   193.5   197.5
10  200  201  202  203  204  205  206  207  208  209  ...  215  216  217  218  219   201.5   205.5   209.5   213.5   217.5
11  220  221  222  223  224  225  226  227  228  229  ...  235  236  237  238  239   221.5   225.5   229.5   233.5   237.5
12  240  241  242  243  244  245  246  247  248  249  ...  255  256  257  258  259   241.5   245.5   249.5   253.5   257.5
13  260  261  262  263  264  265  266  267  268  269  ...  275  276  277  278  279   261.5   265.5   269.5   273.5   277.5
14  280  281  282  283  284  285  286  287  288  289  ...  295  296  297  298  299   281.5   285.5   289.5   293.5   297.5
15  300  301  302  303  304  305  306  307  308  309  ...  315  316  317  318  319   301.5   305.5   309.5   313.5   317.5
16  320  321  322  323  324  325  326  327  328  329  ...  335  336  337  338  339   321.5   325.5   329.5   333.5   337.5
17  340  341  342  343  344  345  346  347  348  349  ...  355  356  357  358  359   341.5   345.5   349.5   353.5   357.5
18  360  361  362  363  364  365  366  367  368  369  ...  375  376  377  378  379   361.5   365.5   369.5   373.5   377.5
19  380  381  382  383  384  385  386  387  388  389  ...  395  396  397  398  399   381.5   385.5   389.5   393.5   397.5

Method 2: .range & iloc:

We can create a range for each 4 columns, then use iloc to acces each slice of your dataframe and calculate the mean and at the same time create your new column:

slices = range(0, len(df.columns)+1, 4)

for idx, rng in enumerate(slices):
    if idx > 0:
        df[f'A{idx}_avg'] = df.iloc[:, slices[idx-1]:slices[idx]].mean(axis=1)

Output

      A    B    C    D    E    F    G    H    I    J  ...    P    Q    R    S    T  A1_avg  A2_avg  A3_avg  A4_avg  A5_avg
0     0    1    2    3    4    5    6    7    8    9  ...   15   16   17   18   19     1.5     5.5     9.5    13.5    17.5
1    20   21   22   23   24   25   26   27   28   29  ...   35   36   37   38   39    21.5    25.5    29.5    33.5    37.5
2    40   41   42   43   44   45   46   47   48   49  ...   55   56   57   58   59    41.5    45.5    49.5    53.5    57.5
3    60   61   62   63   64   65   66   67   68   69  ...   75   76   77   78   79    61.5    65.5    69.5    73.5    77.5
4    80   81   82   83   84   85   86   87   88   89  ...   95   96   97   98   99    81.5    85.5    89.5    93.5    97.5
5   100  101  102  103  104  105  106  107  108  109  ...  115  116  117  118  119   101.5   105.5   109.5   113.5   117.5
6   120  121  122  123  124  125  126  127  128  129  ...  135  136  137  138  139   121.5   125.5   129.5   133.5   137.5
7   140  141  142  143  144  145  146  147  148  149  ...  155  156  157  158  159   141.5   145.5   149.5   153.5   157.5
8   160  161  162  163  164  165  166  167  168  169  ...  175  176  177  178  179   161.5   165.5   169.5   173.5   177.5
9   180  181  182  183  184  185  186  187  188  189  ...  195  196  197  198  199   181.5   185.5   189.5   193.5   197.5
10  200  201  202  203  204  205  206  207  208  209  ...  215  216  217  218  219   201.5   205.5   209.5   213.5   217.5
11  220  221  222  223  224  225  226  227  228  229  ...  235  236  237  238  239   221.5   225.5   229.5   233.5   237.5
12  240  241  242  243  244  245  246  247  248  249  ...  255  256  257  258  259   241.5   245.5   249.5   253.5   257.5
13  260  261  262  263  264  265  266  267  268  269  ...  275  276  277  278  279   261.5   265.5   269.5   273.5   277.5
14  280  281  282  283  284  285  286  287  288  289  ...  295  296  297  298  299   281.5   285.5   289.5   293.5   297.5
15  300  301  302  303  304  305  306  307  308  309  ...  315  316  317  318  319   301.5   305.5   309.5   313.5   317.5
16  320  321  322  323  324  325  326  327  328  329  ...  335  336  337  338  339   321.5   325.5   329.5   333.5   337.5
17  340  341  342  343  344  345  346  347  348  349  ...  355  356  357  358  359   341.5   345.5   349.5   353.5   357.5
18  360  361  362  363  364  365  366  367  368  369  ...  375  376  377  378  379   361.5   365.5   369.5   373.5   377.5
19  380  381  382  383  384  385  386  387  388  389  ...  395  396  397  398  399   381.5   385.5   389.5   393.5   397.5

[20 rows x 25 columns]

Upvotes: 1

jottbe
jottbe

Reputation: 4521

I would do that as follows in a loop, going over the columns and cutting them into groups of 4 columns each (the last group might be smaller):

cols=list(df.columns)
while len(cols) > 0:
    group= cols[:4]
    cols= cols[4:]
    df['mean_' + '_'.join(group)]= df[group].mean(axis='columns')

The result looks like

df[[col for col in df if col.startswith('mean_')]]

    mean_A_B_C_D  mean_E_F_G_H  mean_I_J_K_L  mean_M_N_O_P  mean_Q_R_S_T
0            1.5           5.5           9.5          13.5          17.5
1           21.5          25.5          29.5          33.5          37.5
2           41.5          45.5          49.5          53.5          57.5
3           61.5          65.5          69.5          73.5          77.5
4           81.5          85.5          89.5          93.5          97.5
5          101.5         105.5         109.5         113.5         117.5
...

If you want result columns like A1..., just add a counter variable in the loop and use 'A{}'.format(i) as the column name.

Upvotes: 2

Related Questions