Reputation: 71
I have the following function for getting the column name of last non-zero value of a row
import pandas as pd
def myfunc(X, Y):
df = X.iloc[Y]
counter = len(df)-1
while counter >= 0:
if df[counter] == 0:
counter -= 1
else:
break
return(X.columns[counter])
Using the following code example
data = {'id': ['1', '2', '3', '4', '5', '6'],
'name': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE', 'GGG'],
'A1': [1, 1, 1, 0, 1, 1],
'B1': [0, 0, 1, 0, 0, 1],
'C1': [1, 0, 1, 1, 0, 0],
'A2': [1, 0, 1, 0, 1, 0]}
df = pd.DataFrame(data)
df
myfunc(df, 5) # 'B1'
I would like to know how can I apply this function to all rows in a dataframe, and put the results into a new column of df
I am thinking about looping across all rows (which probably is not the good approach) or using lambdas with apply function. However, I have not suceed with this last approach. Any help?
Upvotes: 1
Views: 1910
Reputation: 5433
Here is a simpler and faster solution using DataFrame.idxmax
.
>>> res = df.iloc[:, :1:-1].idxmax(axis=1)
>>> res
0 A2
1 A1
2 A2
3 C1
4 A2
5 B1
dtype: object
The idea is to select only the Ai
and Bi
columns and reverse the order of them (df.iloc[:, :1:-1]
) and then return the column label of the first occurrence of maximum (1 in this case) for each row (.idxmax(axis=1)
).
Note that this solution (as the other answer) assumes that each row contains at least one entry higher than zero.
This assumption can be relaxed to 'each row contains at least one non-zero entry' if we first mask the non-zero entries (using .ne(0)
). This works because .ne(0)
produces a boolean mask and True > False <=> 1 > 0
.
>>> res = df.iloc[:, :1:-1].ne(0).idxmax(axis=1)
res
0 A2
1 A1
2 A2
3 C1
4 A2
5 B1
dtype: object
Upvotes: 1
Reputation:
I've modified your function a little bit to work across rows:
def myfunc(row):
counter = len(row)-1
while counter >= 0:
if row[counter] == 0:
counter -= 1
else:
break
return row.index[counter]
Now just call df.apply
your function and axis=1
to call the function for each row of the dataframe:
>>> df.apply(myfunc, axis=1)
0 A2
1 A1
2 A2
3 C1
4 A2
5 B1
dtype: object
However, you can ditch your custom function and use this code to do what you're looking for in a much faster and more concise manner:
>>> df[df.columns[2:]].T.cumsum().idxmax()
0 A2
1 A1
2 A2
3 C1
4 A2
5 B1
dtype: object
Upvotes: 2