Dana
Dana

Reputation: 3

How to get the max value of each column for each group in a dataframe?

I have the following dataframe:

      X      Y      Z      A      B
1    7.9    0.2    0.6    0.3    0.75
2    8.9    0.2    0.1    0.8    0.60
3    8.9    0.2    0.9    0.0    0.45
4    4.3    0.3    2.3    0.3    0.35
5    7.8    0.3    2.0    0.6    0.36
6    8.9    0.3    1.3    0.4    0.34
7    1.2    3.4    4.0    3.0    3.50
8    7.8    9.0    0.0    9.0    9.50

The dataframe is grouped using the 'Y' column. In each group, I would like to get the average value for 'X' and max for 'Z', 'A', and 'B'.

The result should look like this:

      X      Y      Z       A      B
0   8.56    0.2    0.9     0.8    0.75
1   7.00    0.3    2.3     0.6    0.36
2   1.20    3.4    4.0     3.0    3.50
3   7.80    9.0    0.0     9.0    9.50

Upvotes: 0

Views: 60

Answers (3)

fsimonjetz
fsimonjetz

Reputation: 5802

Assuming you're talking about a pandas dataframe you can use df.agg, specifying an aggregation function for each column.

import pandas as pd

df = pd.DataFrame({'X': [7.9, 8.9, 8.9, 4.3, 7.8, 8.9, 1.2, 7.8],
                   'Y': [0.2, 0.2, 0.2, 0.3, 0.3, 0.3, 3.4, 9.0],
                   'Z': [0.6, 0.1, 0.9, 2.3, 2.0, 1.3, 4.0, 0.0],
                   'A': [0.3, 0.8, 0.0, 0.3, 0.6, 0.4, 3.0, 9.0],
                   'B': [0.75, 0.6, 0.45, 0.35, 0.36, 0.34, 3.5, 9.5]}))

df2 = df.groupby("Y", as_index=False).agg({"X":"mean",
                                           "Z":"max", 
                                           "A":"max", 
                                           "B":"max"})

gives

>>> df2
     Y         X    Z    A     B
0  0.2  8.566667  0.9  0.8  0.75
1  0.3  7.000000  2.3  0.6  0.36
2  3.4  1.200000  4.0  3.0  3.50
3  9.0  7.800000  0.0  9.0  9.50

Upvotes: 2

Vladyslav
Vladyslav

Reputation: 117

If I got your question right, then the main thing here is how you are storing the data. The first way on my mind (maybe not the best one) is dictionary of dictionaries

data = Dict("Y-value1" ==> Dict(\
                        "X" ==> [1, 2, 3]\
                        "Z" ==> [4, 5, 6]\
                        "A" ==> [7, 8, 9]\
                        "B" ==> [10, 11, 12])\
           "Y-value2" ==> Dict(\
                        "X" ==> [1, 2, 3]\
                        "Z" ==> [4, 5, 6]\
                        "A" ==> [7, 8, 9]\
                        "B" ==> [10, 11, 12])) # And so on

Then you could just use those standard functions for finding max, min and average (or write them by yourself)

Upvotes: 0

Gautam Chettiar
Gautam Chettiar

Reputation: 459

For getting the max values, you can try something like this:

max_value = max(df['column_name'].values)

A similar approach for getting the average as well.

Upvotes: 0

Related Questions