Reputation: 27
I have a pandas data frame that consists of two columns with value. Some of the values are missing and I would like to create a third column that marks if there are missing values in both columns or if one is filled. I am unsure on how to do this since I am new any help you can provided would be greatly appreciated
#input
df = {'First': ['','','A','B','B','C'],
'Second': ['12', '', '10', '', '', '11']}
df = pd.DataFrame(data = d)
#Possible output of third column
df['Third'] = ['Secondfilled', 'missing', 'bothfilled', 'Firstfilled', 'Firstfilled', bothfilled']
Upvotes: 0
Views: 935
Reputation: 7818
One-line solution with no ifelse nor custom functions. Improved with @SeaBean 's suggestion!
d = {0: 'Missing', 1: 'FirstFilled', 2: 'SecondFilled', 3: 'BothFilled'}
df['Third'] = (df.ne('')*(1,2)).sum(1).map(d)
Output:
print(df)
First Second Third
0 12 SecondFilled
1 Missing
2 A 10 BothFilled
3 B FirstFilled
4 B FirstFilled
5 C 11 BothFilled
Upvotes: 3
Reputation: 30022
You can use apply()
with a lookup dict.
lookup = {'10': 'Firstfilled', '01': 'Secondfilled', '11': 'bothfilled', '00': 'missing'}
def fill(row):
key = '00'
if row['First'] != '':
key = '1' + key[1]
if row['Second'] != '':
key = key[0] + '1'
return lookup[key]
df['Third'] = df.apply(fill, axis=1)
# print(df)
First Second Third
0 12 Secondfilled
1 missing
2 A 10 bothfilled
3 B Firstfilled
4 B Firstfilled
5 C 11 bothfilled
Upvotes: 1