Reputation: 431
I have a tricky dataframe in Pandas that needs to stay the way it is, here is an example of the formatting:
entry_name | entry_id | 052018 info_1 | 062018 info_2 | 052018 other_1 | 062018 other_2 |
I need to add a new column that will retrieve the column title of the first non-zero entry for info and another column that will retrieve the column title of the first non-zero entry for other (thus really giving us a date)
Any help on how to do this?
Thanks!
Upvotes: 3
Views: 2353
Reputation: 863166
Use filter
for select columns with info
and other
, then add new column called NaN
for general solution - this value is in new column if only 0
rows, last compare values by ne
(!=
) and get column of first True
by idxmax
:
print (df)
entry_name entry_id 052018 info_1 062018 info_2 052018 other_1 \
0 a 1 0 0 1
1 b 2 0 1 0
2 c 4 0 0 0
3 d 5 2 3 4
062018 other_2
0 2
1 0
2 0
3 4
df1 = df.filter(like='info').copy()
df1[np.nan] = 1
df['info'] = df1.ne(0).idxmax(axis=1)
df2 = df.filter(like='other').copy()
df2[np.nan] = 1
df['other'] = df2.ne(0).idxmax(axis=1)
print (df)
entry_name entry_id 052018 info_1 062018 info_2 052018 other_1 \
0 a 1 0 0 1
1 b 2 0 1 0
2 c 4 0 0 0
3 d 5 2 3 4
062018 other_2 info other
0 2 NaN 052018 other_1
1 0 062018 info_2 NaN
2 0 NaN NaN
3 4 052018 info_1 052018 other_1
Upvotes: 3
Reputation: 1484
Let us generate a DataFrame
simulating yours:
>>> df=pd.DataFrame(random.randint(3,size=(8,5)),columns=list("abcde"))
>>> df
a b c d e
0 2 2 1 2 2
1 1 2 2 0 2
2 0 2 1 0 2
3 2 1 1 0 1
4 0 1 0 0 0
5 0 0 2 0 1
6 2 0 1 1 2
7 2 0 0 0 0
Then numpy.argmax()
can be used to find the position of the first nonzero value in each row (you can modify the condition for nonempty string etc.):
>>> argmax(df.values!=0,axis=1)
array([0, 0, 1, 0, 1, 2, 0, 0], dtype=int64)
Let us append this field to df
after converting it into the values into the header of df
:
>>> df['first_nonzero']=df.keys()[argmax(df.values!=0,axis=1)]
>>> df
a b c d e first_nonzero
0 2 2 1 2 2 a
1 1 2 2 0 2 a
2 0 2 1 0 2 b
3 2 1 1 0 1 a
4 0 1 0 0 0 b
5 0 0 2 0 1 c
6 2 0 1 1 2 a
7 2 0 0 0 0 a
Voila!
Upvotes: 3