ems
ems

Reputation: 990

Pandas find first non-zero entry across multiple columns based on a condition

I have a dataframe like this -

   Alpha    Id   Col1   Col2    Col3    Col4
0   a      Col3   0      0       10      34
1   b      Col2   0      5       0       4
2   c      Col1   16     0       3       0
3   d      Col2   0      0       0       9
4   e      Col3   0      0       18      0
5   f      Col1   0      14      0       29

For each row, I need to find the column name of the first non-zero value occurring after the column specified in ID

So that the resulting dataframe looks like this -

   Alpha    Id     Col1   Col2  Col3    Col4    Result
0   a       Col3    0      0    10        34    Col4
1   b       Col2    0      5    0         4     Col4
2   c       Col1    16     0    3         0     Col3
3   d       Col2    0      0    0         9     Col4
4   e       Col3    0      0    18        0     0
5   f       Col1    0      14   0         29    Col2

I know idxmax() can be used for getting the first non-zero entry in a row. But how can I specify the condition here?

Thanks in advance for any solution/hint.

Upvotes: 3

Views: 519

Answers (3)

smci
smci

Reputation: 33960

For each row, you want to look up the slice of columns starting with the column specified in 'Id', rightwards.

If your spec didn't say first non-zero value occurring after the column specified, we could do the straightforward:

>>> df['Result'] = df.apply(lambda row: row.loc[ row['Id']: ].ne(0).idxmin(), axis=1)
>>> df
  Alpha    Id  Col1  Col2  Col3  Col4 Result
0     a  Col3     0     0    10    34   Col3
1     b  Col2     0     5     0     4   Col3
2     c  Col1    16     0     3     0   Col2
3     d  Col2     0     0     0     9   Col2
4     e  Col3     0     0    18     0   Col4 # <-- wrong, should slice from Col3...
5     f  Col1     0    14     0    29   Col1

But in order to implement first non-zero value occurring after the column specified, we'd need to compute a column-slices using integer indices rather than names:

>>> cols = df.columns[2:]    

Index(['Col1', 'Col2', 'Col3', 'Col4'], dtype='object')
# Now we can do cols.get_loc(row['Id'])

>>> df.apply(lambda row: cols.get_loc(row['Id']), axis=1)
0    2
1    1
2    0
3    1
4    2
5    0

The rest will be something like looking for the index of the leftmost nonzero entry:

df.apply(lambda row: row.iloc[ (df.columns.get_loc(row['Id'])+2) : ].ne(0).idxmin(), axis=1)

Upvotes: 0

kantal
kantal

Reputation: 2407

After a simple, not really pandas way preparation, things become easier:

cols= ["Col1", "Col2", "Col3", "Col4"]
df2=df.copy()
# A simple preparation:
for idx,cc in zip(df2.index,df2.Id):
    df2.at[idx,cc]=0

crit= df2[cols].ne(0)
df["nz"]= crit.idxmax(axis=1)[crit.any(axis=1)]

    Alpha    Id  Col1  Col2  Col3  Col4    nz
0     a  Col3     0     0    10    34  Col4
1     b  Col2     0     5     0     4  Col4
2     c  Col1    16     0     3     0  Col3
3     d  Col2     0     0     0     9  Col4
4     e  Col3     0     0    18     0   NaN
5     f  Col1     0    14     0    29  Col2

Upvotes: 1

jezrael
jezrael

Reputation: 863166

Use:

#compare last 4 columns by Id column with broadcasting
a = df.columns[-4:].to_numpy() == df['Id'].to_numpy()[:, None]
#print (a)

#shifting by 1 values, check next matching by cumulative sum and compare for not equal
m1 = np.cumsum(a[:, :-1], axis=1) != 0
#compare last 3 columns
m2 = df.iloc[:, -3:].ne(0)
#chain masks by bitwise AND
m = m1 & m2
#get values of index if at least one True per row else 0
df['new'] = np.where(m.any(axis=1), m.idxmax(axis=1), 0)
print (df)
  Alpha    Id  Col1  Col2  Col3  Col4   new
0     a  Col3     0     0    10    34  Col4
1     b  Col2     0     5     0     4  Col4
2     c  Col1    16     0     3     0  Col3
3     d  Col2     0     0     0     9  Col4
4     e  Col3     0     0    18     0     0
5     f  Col1     0    14     0    29  Col2

Upvotes: 3

Related Questions