Colin Sorensen
Colin Sorensen

Reputation: 149

Get value from right-most column consisting of a value and store in new column?

Below is a current sample df.

+-------+---+---+----+---+---+---+
| name  | 1 | 2 | 3  | 4 | 5 | 6 |
+-------+---+---+----+---+---+---+
| mark  | a | b | c  | d |   |   |
| joe   | a | b |    |   |   |   |
| bob   | a | b | c  | d | e | f |
| luke  | a |   |    |   |   |   |
| hana  | a | b | d  |   |   |   |
| kelly | a | b | c  |   |   |   |
+-------+---+---+----+---+---+---+

My desired output is below, with a "final" column that pulls the value from the right-most column that contains a value.

+-------+---+---+----+---+---+---+-------+
| name  | 1 | 2 | 3  | 4 | 5 | 6 | final |
+-------+---+---+----+---+---+---+-------+
| mark  | a | b | c  | d |   |   | d     |
| joe   | a | b |    |   |   |   | b     |
| bob   | a | b | c  | d | e | f | f     |
| luke  | a |   |    |   |   |   | a     |
| hana  | a | b | c  |   |   |   | c     |
| kelly | a | b | c  |   |   |   | c     |
+-------+---+---+----+---+---+---+-------+

I've tried to use the following if statement (along w/ a thousand other things), but it doesn't seem to work...

if df['2'].isna == True: 
  df['final']=df['1']
elif df['2'].isna == False and df['3'].isna == True: 
  df['final']=df['2']

I figure I need to use a function here? But after several hours of perusing stack overflow and experimenting, I'm stuck.

Upvotes: 1

Views: 825

Answers (4)

Thomas Gamsjäger
Thomas Gamsjäger

Reputation: 290

This is a simplified version using ffill:

df['final'] = df.ffill(axis=1).iloc[:, -1]

axis=1 directs Pandas to sweep across columns. And iloc[:, -1] makes sure that the 'final' value is determined beginning with the rightmost column that does not contain 'NaN'.

Output:

    name    1   2   3   4   5   6   final
0   mark    a   b   c   d   NaN NaN     d
1   joe     a   b   NaN NaN NaN NaN     b
2   bob     a   b   c   d   e   f       f
3   luke    a   NaN NaN NaN NaN NaN     a
4   hana    a   b   c   NaN NaN NaN     c
5   kelly   a   b   c   NaN NaN NaN     c

Upvotes: 1

Georgina Skibinski
Georgina Skibinski

Reputation: 13397

Try:

df["final"]=df[df.columns[::-1]].bfill(axis=1).iloc[:,0]

(*) In case if the empty one is empty string, and not nan - you should start with:

df=df.replace("", np.nan)

Using @nrfd input the output:

df = pd.DataFrame({'1': ['a', 'b', 'c'],
                   '2': ['a', 'd', np.nan],
                   '3': ['f', np.nan, np.nan]
                   }, index=['mark', 'james', 'bob'])

df["final"]=df[df.columns[::-1]].bfill(axis=1).iloc[:,0]

>>> df
       1    2    3 final
mark   a    a    f     f
james  b    d  NaN     d
bob    c  NaN  NaN     c

Upvotes: 0

nrfd
nrfd

Reputation: 1

You can write a short function and use apply on each row:

df = pd.DataFrame({'1': ['a', 'b', 'c'],
                   '2': ['a', 'd', np.nan],
                   '3': ['f', np.nan, np.nan]
                   }, index=['mark', 'james', 'bob'])


def first_nan(row):
    loc = len(row[row.isna()])
    return row[-(loc + 1)]


df['final'] = df.apply(first_nan, axis=1)

Output:

       1    2    3 final
mark   a    a    f     f
james  b    d  NaN     d
bob    c  NaN  NaN     c

Upvotes: 0

Zaraki Kenpachi
Zaraki Kenpachi

Reputation: 5740

You need to create function for apply that will take value based on context length.

import pandas as pd

df = pd.DataFrame({'name':['m','j','k'], 1:['a','a','a'], 2:['b','b',''], 3:['c','','']})

def function(row):
    if len(row[3]) > 0:
        return row[3]
    elif len(row[2]) > 0:
        return row[2]
    else:
        return row[1]

df['final'] = df.apply(function, axis=1)

Output:

  name  1  2  3 final
0    m  a  b  c     c
1    j  a  b        b
2    k  a           a

Upvotes: 1

Related Questions