Reputation: 149
Below is a current sample df.
+-------+---+---+----+---+---+---+
| name | 1 | 2 | 3 | 4 | 5 | 6 |
+-------+---+---+----+---+---+---+
| mark | a | b | c | d | | |
| joe | a | b | | | | |
| bob | a | b | c | d | e | f |
| luke | a | | | | | |
| hana | a | b | d | | | |
| kelly | a | b | c | | | |
+-------+---+---+----+---+---+---+
My desired output is below, with a "final" column that pulls the value from the right-most column that contains a value.
+-------+---+---+----+---+---+---+-------+
| name | 1 | 2 | 3 | 4 | 5 | 6 | final |
+-------+---+---+----+---+---+---+-------+
| mark | a | b | c | d | | | d |
| joe | a | b | | | | | b |
| bob | a | b | c | d | e | f | f |
| luke | a | | | | | | a |
| hana | a | b | c | | | | c |
| kelly | a | b | c | | | | c |
+-------+---+---+----+---+---+---+-------+
I've tried to use the following if statement (along w/ a thousand other things), but it doesn't seem to work...
if df['2'].isna == True:
df['final']=df['1']
elif df['2'].isna == False and df['3'].isna == True:
df['final']=df['2']
I figure I need to use a function here? But after several hours of perusing stack overflow and experimenting, I'm stuck.
Upvotes: 1
Views: 825
Reputation: 290
This is a simplified version using ffill
:
df['final'] = df.ffill(axis=1).iloc[:, -1]
axis=1
directs Pandas to sweep across columns. And iloc[:, -1]
makes sure that the 'final' value is determined beginning with the rightmost column that does not contain 'NaN'.
Output:
name 1 2 3 4 5 6 final
0 mark a b c d NaN NaN d
1 joe a b NaN NaN NaN NaN b
2 bob a b c d e f f
3 luke a NaN NaN NaN NaN NaN a
4 hana a b c NaN NaN NaN c
5 kelly a b c NaN NaN NaN c
Upvotes: 1
Reputation: 13397
Try:
df["final"]=df[df.columns[::-1]].bfill(axis=1).iloc[:,0]
(*) In case if the empty one is empty string, and not nan
- you should start with:
df=df.replace("", np.nan)
Using @nrfd input the output:
df = pd.DataFrame({'1': ['a', 'b', 'c'],
'2': ['a', 'd', np.nan],
'3': ['f', np.nan, np.nan]
}, index=['mark', 'james', 'bob'])
df["final"]=df[df.columns[::-1]].bfill(axis=1).iloc[:,0]
>>> df
1 2 3 final
mark a a f f
james b d NaN d
bob c NaN NaN c
Upvotes: 0
Reputation: 1
You can write a short function and use apply
on each row:
df = pd.DataFrame({'1': ['a', 'b', 'c'],
'2': ['a', 'd', np.nan],
'3': ['f', np.nan, np.nan]
}, index=['mark', 'james', 'bob'])
def first_nan(row):
loc = len(row[row.isna()])
return row[-(loc + 1)]
df['final'] = df.apply(first_nan, axis=1)
Output:
1 2 3 final
mark a a f f
james b d NaN d
bob c NaN NaN c
Upvotes: 0
Reputation: 5740
You need to create function for apply
that will take value based on context length.
import pandas as pd
df = pd.DataFrame({'name':['m','j','k'], 1:['a','a','a'], 2:['b','b',''], 3:['c','','']})
def function(row):
if len(row[3]) > 0:
return row[3]
elif len(row[2]) > 0:
return row[2]
else:
return row[1]
df['final'] = df.apply(function, axis=1)
Output:
name 1 2 3 final
0 m a b c c
1 j a b b
2 k a a
Upvotes: 1