Reputation: 257
So, say I have a sample DataFrame as:
import pandas as pd
x = pd.DataFrame({"Name": ["A", "B", "C"], "total_1": [1, 2, 3], "total_2": [7, 8, 9], "total_3": [9, 10, 11]}
What I would like to do is create a new dataframe which contains the median over all columns which contain the substring total
and to do it along the row. i.e. the new data frame would have a column which should be [7, 8, 9]
.
I tthink I would be able to do it if I could sub-select the columns with total
in their names and then compute median along axis 1 but I am not sure how to do this selection. I do not know apriori how many such columns I will have.
Upvotes: 2
Views: 3675
Reputation: 6323
Your reasoning was spot on. Here it is in code.
# Declare list that contains all columns that contain the string 'total'
cols = [col for col in x.columns if 'total' in col]
# Declare median as new column
x['median'] = x[cols].median(axis=1)
# Result
print(x)
Name total_1 total_2 total_3 median
0 A 1 7 9 7.0
1 B 2 8 10 8.0
2 C 3 9 11 9.0
Note that axis=1
tells median()
to operate along the columns, row by row. So it does the operations horizontally.
Upvotes: 1
Reputation: 26676
x['median']=x.filter(like='total').apply(lambda x: x.median(), axis=1)
Name total_1 total_2 total_3 median
0 A 1 7 9 7.0
1 B 2 8 10 8.0
2 C 3 9 11 9.0
Upvotes: 4
Reputation: 13106
Select your columns using a list comprehension, then apply the median as you stated:
cols = [col for col in x.columns if 'total' in col]
x['newcol'] = x[cols].median(axis=1)
Name total_1 total_2 total_3 newcol
0 A 1 7 9 7.0
1 B 2 8 10 8.0
2 C 3 9 11 9.0
Upvotes: 4