TylerNG
TylerNG

Reputation: 941

Pandas taking values in columns order

Given this df:

Name  i   j   k   
A     1   0   3
B     0   5   4   
C     0   0   4
D     0       5

My goal is to add in a column "Final" that takes value in an order of i j k:

Name  i   j   k     Final
A     1   0   3     1      
B     0   5   4     5 
C     0   0   4     4
D     0       5            <-- this one is tricky.  We do count the null for j column here. 

Here is my attempt: df['Final'] = df[['i', 'j', 'k'].bfill(axis=1).iloc[:, 0]. This doesn't work since it always takes the value of column 1. Any help would be appreciated. :)

Many thanks!

Upvotes: 0

Views: 58

Answers (3)

Haleemur Ali
Haleemur Ali

Reputation: 28303

Using pandas.Series.nonzero the solution can be expressed succicntly.

df['Final'] = df.apply(lambda x: x.iloc[x.nonzero()[0][0]], axis=1)

How this works:

nonzero() returns the indices of elements that are not zero (and will match np.nan as well).

We take the first index location and return the value at that location to construct the Final Column.

We apply this on the dataframe using axis=1 to apply it row by row.

A benefit of this approach is that it does not depend on naming individual columns ['i', 'j', 'k']

Upvotes: 1

DSM
DSM

Reputation: 353359

If by "taking values in column order", you mean "taking the first non-zero value in each row, or zero if all values are zero", you could use DataFrame.lookup after doing a boolean comparison:

In [113]: df["final"] = df.lookup(df.index,(df[["i","j","k"]] != 0).idxmax(axis=1))

In [114]: df
Out[114]: 
  Name  i    j  k  final
0    A  1  0.0  3    1.0
1    B  0  5.0  4    5.0
2    C  0  0.0  4    4.0
3    D  0  NaN  5    NaN

where first we compare everything with zero:

In [115]: df[["i","j","k"]] != 0
Out[115]: 
       i      j     k
0   True  False  True
1  False   True  True
2  False  False  True
3  False   True  True

and then we use idxmax to find the first True (or the first False if you have a row of zeroes):

In [116]: (df[["i","j","k"]] != 0).idxmax(axis=1)
Out[116]: 
0    i
1    j
2    k
3    j
dtype: object

Upvotes: 2

BENY
BENY

Reputation: 323326

Is this what you need ?

df['Final']=df[['i', 'j', 'k']].mask((df=='')|(df==0)).bfill(axis=1).iloc[:, 0][(df!='').all(1)]
df
Out[1290]: 
  Name  i  j  k  Final
0    A  1  0  3    1.0
1    B  0  5  4    5.0
2    C  0  0  4    4.0
3    D  0     5    NaN

Upvotes: 1

Related Questions