Raven
Raven

Reputation: 859

Flag Exact Word Within a String Using String Contains

I have a dataset that looks like this:

ID Symptoms
1  ear, fever
2  hearing loss
3  hurt ear
4  spear wound
5  bad hearing  
6  earring cut

I want to flag only the records where "ear" appears. So for example, the output would look like this:

ID Symptoms         Ear
1  ear, fever        1
2  hearing loss      0
3  hurt ear          1
4  spear wound       0
5  bad hearing       0 
6  earring cut       0

I've played around with some code with little success:

Issue: this code would pull anything with the text "ear"

LABS_TAT.loc[:,"Ear"]=np.where(LABS_TAT["Symptoms"].str.contains("ear", case=False),1,0)

Notice the space after "ear ", this code would not flag the record "hurt ear"

 LABS_TAT.loc[:,"Ear"]=np.where(LABS_TAT["Symptoms"].str.contains("ear ", case=False),1,0)

Notice the space before " ear", this code would not flag the record "ear, fever"

 LABS_TAT.loc[:,"Ear"]=np.where(LABS_TAT["Symptoms"].str.contains(" ear", case=False),1,0)

How can I fix my code so that it flags any records with the word "ear"? I feel like there is a simple answer but I'm still somewhat a newb to python.

Upvotes: 1

Views: 326

Answers (2)

ig0774
ig0774

Reputation: 41277

Since .contains() takes a regex pattern, this should be as easy as .contains(r"\bear\b", case=False).

\b indicates a word-boundry character. You can read more about regular expressions in the Python standard library documentation.

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use Series.str.contains with a regex pattern:

df['Ear'] = df['Symptoms'].str.contains(r'(?i)\bear\b').astype(int)

Result:

  ID      Symptoms   Ear
0   1    ear, fever    1
1   2  hearing loss    0
2   3      hurt ear    1
3   4   spear wound    0
4   5   bad hearing    0
5   6   earring cut    0

Upvotes: 1

Related Questions