Reputation:
I would like to ask how I can select the highest values per each rows within the following columns
ID mean_idx269 mean_idx270 mean_idx271 mean_idx272 mean_idx273 mean_idx274 mean_idx275 ...
1 37 42 44 40 42 35 46
2 50 41 23 64 21 64 21
3 12 42 55 10 22 12 22
My expected output would be:
ID mean_idx269 mean_idx270 mean_idx271 mean_idx272 mean_idx273 mean_idx274 mean_idx275 ... max
1 37 42 44 40 42 35 46 46
2 50 41 23 64 21 64 21 64
3 12 42 55 10 22 12 22 55
There are other columns in the dataframe, named in a different way, but I would need to extract the maximum from those columns starting/containing mean_idx
.
The result should be a dataframe containing the ID
column and the columns containing mean_idx
.
I do not know if max()
function can be used in this case, as I need to select through more columns per each ID.
Upvotes: 3
Views: 362
Reputation: 62403
'ID'
column to the front, because it won't match the list comprehension..max
method and specify axis=1
for row wise.
df_subset[mean_cols].max(axis=1)
and not just df_subset.max(axis=1)
, otherwise ID
will also be used to determine max
.import pandas as pd
data = {'ID': [1, 2, 3],
'mean_idx269': [37, 50, 12],
'mean_idx270': [42, 41, 42],
'mean_idx271': [44, 23, 55],
'mean_idx272': [40, 64, 10],
'mean_idx273': [42, 21, 22],
'mean_idx274': [35, 64, 12],
'mean_idx275': [46, 21, 22],
'test_idx333': [1, 3, 4]}
df = pd.DataFrame(data)
# get the columns with the desired pattern
mean_cols = [col for col in df.columns if 'mean_idx' in col]
# create a subset of the dataframe with the desired columns
df_subset = df[['ID'] + mean_cols].copy()
# get max for specific cols
df_subset['max_row'] = df_subset[mean_cols].max(axis=1)
# display(df_subset)
ID mean_idx269 mean_idx270 mean_idx271 mean_idx272 mean_idx273 mean_idx274 mean_idx275 max_row
0 1 37 42 44 40 42 35 46 46
1 2 50 41 23 64 21 64 21 64
2 3 12 42 55 10 22 12 22 55
Upvotes: 4
Reputation: 153460
Use pd.DataFrame.filter
to select only columns that match a pattern:
df['max'] = df.filter(like='mean_idx').max(axis=1)
Ouput:
ID mean_idx269 mean_idx270 mean_idx271 mean_idx272 mean_idx273 mean_idx274 mean_idx275 max
0 1 37 42 44 40 42 35 46 46
1 2 50 41 23 64 21 64 21 64
2 3 12 42 55 10 22 12 22 55
Upvotes: 6