Reputation: 2533
Assume this dataset:
df = pd.DataFrame({
'name': ['John','William', 'Nancy', 'Susan', 'Robert', 'Lucy', 'Blake', 'Sally', 'Bruce', 'Mike'],
'injury': ['right hand broken', 'lacerated left foot', 'foot broken', 'right foot fractured', '', 'sprained finger', 'chest pain', 'swelling in arm', 'laceration to arms, hands, and foot', np.NaN]
})
name injury
0 John right hand broken
1 William lacerated left foot
2 Nancy foot broken
3 Susan right foot fractured
4 Robert
5 Lucy sprained finger
6 Blake chest pain
7 Sally swelling in arm
8 Bruce lacerations to arm, hands, and foot
9 Mike NaN
10 Jeff swollen cheek
I reduce the injuries to only the selected body part:
selected_words = ["hand", "foot", "finger", "chest", "arms", "arm", "hands"]
df["injury"] = (
df["injury"]
.str.replace(",", "")
.str.split(" ", expand=False)
.apply(lambda x: ", ".join(set([i for i in x if i in selected_words])))
)
But, this throws an error to the NaN value at index 9:
TypeError: 'float' object is not iterable
How would I modify the list comprehension such that:
it checks for any NaN
values
outputs NaN
if it encounters a row that is blank or does not have a body part contained in the list of selected_body_parts
(e.g. index 10)
The desired output is:
name injury
0 John hand
1 William foot
2 Nancy foot
3 Susan foot
4 Robert NaN
5 Lucy finger
6 Blake chest
7 Sally arm
8 Bruce hand, foot, arm
9 Mike NaN
10 Jeff NaN
I tried the following:
.apply(lambda x: ", ".join(set([i for i in x if i in selected_words and i is not np.nan else np.nan])))
But, the syntax is incorrect.
Any assistance would be most appreciated. Thanks!
Upvotes: 3
Views: 82
Reputation: 64
you can use .dropna() before the lambda
df["injury"].str.replace(",", "").str.split(" ", expand=False).dropna().apply(lambda x: ", ".join(set([i for i in x if i in selected_words])))
0 hand
1 foot
2 foot
3 foot
4
5 finger
6 chest
7 arm
8 foot, hands, arms
Was this the result you wanted?
Upvotes: 1
Reputation: 27567
Your problem isn't that i
is a np.nan but x
is and you can't iterate over np.nan with a comprehension.
I think you probably want to turn your lambda into a named function and pass that like so:
def get_set_of_body_parts(words):
if words is np.nan:
return np.nan
else:
return ", ".join(set([i for i in x if i in selected_words]))
df = pd.DataFrame({
'name': ['John','William', 'Nancy', 'Susan', 'Robert', 'Lucy', 'Blake', 'Sally', 'Bruce', 'Mike'],
'injury': ['right hand broken', 'lacerated left foot', 'foot broken', 'right foot fractured', '', 'sprained finger', 'chest pain', 'swelling in arm', 'laceration to arms, hands, and foot', np.NaN]
})
selected_words = ["hand", "foot", "finger", "chest", "arms", "arm", "hands"]
df["injury"] = (
df["injury"]
.str.replace(",", "")
.str.split(" ", expand=False)
.apply(get_set_of_body_parts)
)
but if you really want to you could do a lambda like so:
.apply(lambda x: np.nan if x is np.nan else ", ".join(set([i for i in x if i in selected_words])))
Upvotes: 1