user12907213
user12907213

Reputation:

Extract maximum value from columns having similar names

I would like to ask how I can select the highest values per each rows within the following columns

 ID   mean_idx269   mean_idx270 mean_idx271 mean_idx272 mean_idx273 mean_idx274 mean_idx275 ... 

 1      37          42          44           40          42          35         46      
 2      50          41          23           64          21          64         21
 3      12          42          55           10          22          12         22

My expected output would be:

 ID   mean_idx269   mean_idx270 mean_idx271 mean_idx272 mean_idx273 mean_idx274 mean_idx275 ... max

 1      37          42          44           40          42          35         46           46   
 2      50          41          23           64          21          64         21           64
 3      12          42          55           10          22          12         22           55

There are other columns in the dataframe, named in a different way, but I would need to extract the maximum from those columns starting/containing mean_idx.

The result should be a dataframe containing the ID column and the columns containing mean_idx.

I do not know if max() function can be used in this case, as I need to select through more columns per each ID.

Upvotes: 3

Views: 362

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62403

  • Use a list comprehension to get the desired columns with matching words
  • Create a separate dataframe containing only the desired columns
    • Add the 'ID' column to the front, because it won't match the list comprehension.
  • Use the .max method and specify axis=1 for row wise.
    • df_subset[mean_cols].max(axis=1) and not just df_subset.max(axis=1), otherwise ID will also be used to determine max.
import pandas as pd

data = {'ID': [1, 2, 3],
        'mean_idx269': [37, 50, 12],
        'mean_idx270': [42, 41, 42],
        'mean_idx271': [44, 23, 55],
        'mean_idx272': [40, 64, 10],
        'mean_idx273': [42, 21, 22],
        'mean_idx274': [35, 64, 12],
        'mean_idx275': [46, 21, 22],
        'test_idx333': [1, 3, 4]}

df = pd.DataFrame(data)

# get the columns with the desired pattern
mean_cols = [col for col in df.columns if 'mean_idx' in col]

# create a subset of the dataframe with the desired columns
df_subset = df[['ID'] + mean_cols].copy()

# get max for specific cols
df_subset['max_row'] = df_subset[mean_cols].max(axis=1)

# display(df_subset)
   ID  mean_idx269  mean_idx270  mean_idx271  mean_idx272  mean_idx273  mean_idx274  mean_idx275  max_row
0   1           37           42           44           40           42           35           46       46
1   2           50           41           23           64           21           64           21       64
2   3           12           42           55           10           22           12           22       55

Upvotes: 4

Scott Boston
Scott Boston

Reputation: 153460

Use pd.DataFrame.filter to select only columns that match a pattern:

df['max'] = df.filter(like='mean_idx').max(axis=1)

Ouput:

   ID  mean_idx269  mean_idx270  mean_idx271  mean_idx272  mean_idx273  mean_idx274  mean_idx275  max
0   1           37           42           44           40           42           35           46   46
1   2           50           41           23           64           21           64           21   64
2   3           12           42           55           10           22           12           22   55

Upvotes: 6

Related Questions