Macterror
Macterror

Reputation: 431

Finding first non-zero entry across columns in Pandas

I have a tricky dataframe in Pandas that needs to stay the way it is, here is an example of the formatting:

entry_name | entry_id | 052018 info_1 | 062018 info_2 | 052018 other_1 | 062018 other_2 |

I need to add a new column that will retrieve the column title of the first non-zero entry for info and another column that will retrieve the column title of the first non-zero entry for other (thus really giving us a date)

Any help on how to do this?

Thanks!

Upvotes: 3

Views: 2353

Answers (2)

jezrael
jezrael

Reputation: 863166

Use filter for select columns with info and other, then add new column called NaN for general solution - this value is in new column if only 0 rows, last compare values by ne (!=) and get column of first True by idxmax:

print (df)
  entry_name  entry_id  052018 info_1  062018 info_2  052018 other_1  \
0          a         1              0              0               1   
1          b         2              0              1               0   
2          c         4              0              0               0   
3          d         5              2              3               4   

   062018 other_2  
0               2  
1               0  
2               0  
3               4  

df1 = df.filter(like='info').copy()
df1[np.nan] = 1
df['info'] = df1.ne(0).idxmax(axis=1)

df2 = df.filter(like='other').copy()
df2[np.nan] = 1
df['other'] = df2.ne(0).idxmax(axis=1)
print (df)
  entry_name  entry_id  052018 info_1  062018 info_2  052018 other_1  \
0          a         1              0              0               1   
1          b         2              0              1               0   
2          c         4              0              0               0   
3          d         5              2              3               4   

   062018 other_2           info           other  
0               2            NaN  052018 other_1  
1               0  062018 info_2             NaN  
2               0            NaN             NaN  
3               4  052018 info_1  052018 other_1  

Upvotes: 3

tif
tif

Reputation: 1484

Let us generate a DataFrame simulating yours:

>>> df=pd.DataFrame(random.randint(3,size=(8,5)),columns=list("abcde"))
>>> df
   a  b  c  d  e
0  2  2  1  2  2
1  1  2  2  0  2
2  0  2  1  0  2
3  2  1  1  0  1
4  0  1  0  0  0
5  0  0  2  0  1
6  2  0  1  1  2
7  2  0  0  0  0

Then numpy.argmax() can be used to find the position of the first nonzero value in each row (you can modify the condition for nonempty string etc.):

>>> argmax(df.values!=0,axis=1)
array([0, 0, 1, 0, 1, 2, 0, 0], dtype=int64)

Let us append this field to df after converting it into the values into the header of df:

>>> df['first_nonzero']=df.keys()[argmax(df.values!=0,axis=1)]
>>> df
   a  b  c  d  e first_nonzero
0  2  2  1  2  2             a
1  1  2  2  0  2             a
2  0  2  1  0  2             b
3  2  1  1  0  1             a
4  0  1  0  0  0             b
5  0  0  2  0  1             c
6  2  0  1  1  2             a
7  2  0  0  0  0             a

Voila!

Upvotes: 3

Related Questions