Reputation: 990
I have a dataframe like this -
Alpha Id Col1 Col2 Col3 Col4
0 a Col3 0 0 10 34
1 b Col2 0 5 0 4
2 c Col1 16 0 3 0
3 d Col2 0 0 0 9
4 e Col3 0 0 18 0
5 f Col1 0 14 0 29
For each row, I need to find the column name of the first non-zero value occurring after the column specified in ID
So that the resulting dataframe looks like this -
Alpha Id Col1 Col2 Col3 Col4 Result
0 a Col3 0 0 10 34 Col4
1 b Col2 0 5 0 4 Col4
2 c Col1 16 0 3 0 Col3
3 d Col2 0 0 0 9 Col4
4 e Col3 0 0 18 0 0
5 f Col1 0 14 0 29 Col2
I know idxmax()
can be used for getting the first non-zero entry in a row.
But how can I specify the condition here?
Thanks in advance for any solution/hint.
Upvotes: 3
Views: 519
Reputation: 33960
For each row, you want to look up the slice of columns starting with the column specified in 'Id', rightwards.
If your spec didn't say first non-zero value occurring after the column specified, we could do the straightforward:
>>> df['Result'] = df.apply(lambda row: row.loc[ row['Id']: ].ne(0).idxmin(), axis=1)
>>> df
Alpha Id Col1 Col2 Col3 Col4 Result
0 a Col3 0 0 10 34 Col3
1 b Col2 0 5 0 4 Col3
2 c Col1 16 0 3 0 Col2
3 d Col2 0 0 0 9 Col2
4 e Col3 0 0 18 0 Col4 # <-- wrong, should slice from Col3...
5 f Col1 0 14 0 29 Col1
But in order to implement first non-zero value occurring after the column specified, we'd need to compute a column-slices using integer indices rather than names:
>>> cols = df.columns[2:]
Index(['Col1', 'Col2', 'Col3', 'Col4'], dtype='object')
# Now we can do cols.get_loc(row['Id'])
>>> df.apply(lambda row: cols.get_loc(row['Id']), axis=1)
0 2
1 1
2 0
3 1
4 2
5 0
The rest will be something like looking for the index of the leftmost nonzero entry:
df.apply(lambda row: row.iloc[ (df.columns.get_loc(row['Id'])+2) : ].ne(0).idxmin(), axis=1)
Upvotes: 0
Reputation: 2407
After a simple, not really pandas way preparation, things become easier:
cols= ["Col1", "Col2", "Col3", "Col4"]
df2=df.copy()
# A simple preparation:
for idx,cc in zip(df2.index,df2.Id):
df2.at[idx,cc]=0
crit= df2[cols].ne(0)
df["nz"]= crit.idxmax(axis=1)[crit.any(axis=1)]
Alpha Id Col1 Col2 Col3 Col4 nz
0 a Col3 0 0 10 34 Col4
1 b Col2 0 5 0 4 Col4
2 c Col1 16 0 3 0 Col3
3 d Col2 0 0 0 9 Col4
4 e Col3 0 0 18 0 NaN
5 f Col1 0 14 0 29 Col2
Upvotes: 1
Reputation: 863166
Use:
#compare last 4 columns by Id column with broadcasting
a = df.columns[-4:].to_numpy() == df['Id'].to_numpy()[:, None]
#print (a)
#shifting by 1 values, check next matching by cumulative sum and compare for not equal
m1 = np.cumsum(a[:, :-1], axis=1) != 0
#compare last 3 columns
m2 = df.iloc[:, -3:].ne(0)
#chain masks by bitwise AND
m = m1 & m2
#get values of index if at least one True per row else 0
df['new'] = np.where(m.any(axis=1), m.idxmax(axis=1), 0)
print (df)
Alpha Id Col1 Col2 Col3 Col4 new
0 a Col3 0 0 10 34 Col4
1 b Col2 0 5 0 4 Col4
2 c Col1 16 0 3 0 Col3
3 d Col2 0 0 0 9 Col4
4 e Col3 0 0 18 0 0
5 f Col1 0 14 0 29 Col2
Upvotes: 3