Reputation: 87
I'm a working professional doing my Mtech and trying to do a project on machine learning. I;m new to python as well as ML. I have a column called as Found and this has multiple values. I want to delete all the rows which is not matching a specific condition mentioned based on found column
df['Found']
0 developement
1 func-test
2 func-test
3 regression
4 func-test
5 integration
6 func-test
7 func-test
8 regression
9 func-test
I want to keep the rows which has Found value as "anything that has test and regression
I wrote the following code.
remove_list = []
for x in range(df.shape[0]):
text = df.iloc[x]['Found']
if not re.search('test|regression', text, re.I):
remove_list.append(x)
print(remove_list)
df.drop(remove_list, inplace = True)
print(df)
but the remove_list is empty. am i doing anything wrong here? or is there a better way of achieving this?
[]
Identifier Status Priority Severity Found Age \
0 Bug 1 V NaN 2 development 1
1 Bug 2 R NaN 6 func-test 203
2 Bug 3 V NaN 2 func-test 9
3 Bug 4 D NaN 3 regression 4
4 Bug 5 V NaN 2 func-test 9
I even tried this but i get the following error:
for x in range(df.shape[0]):
if not re.search('test|regression|customer', df.iloc[x]['Found'], re.I):
df.drop(x, inplace = True)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-77-14f97ad6d00a> in <module>
1 for x in range(df.shape[0]):
----> 2 if not re.search('test|regression|customer', df.iloc[x]['Found'], re.I):
3 df.drop(x, inplace = True)
~/Desktop/Anaconda/anaconda3/envs/nlp_course/lib/python3.7/re.py in search(pattern, string, flags)
183 """Scan through string looking for a match to the pattern, returning
184 a Match object, or None if no match was found."""
--> 185 return _compile(pattern, flags).search(string)
186
187 def sub(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object
Upvotes: 1
Views: 158
Reputation: 41327
You can do this concisely with .str.contains()
and boolean indexing:
df = df[df['Found'].str.contains('test|regression')]
# Identifier Status Priority Severity Found Age
# 1 Bug 2 R NaN 6 func-test 203
# 2 Bug 3 V NaN 2 func-test 9
# 3 Bug 4 D NaN 3 regression 4
# 4 Bug 5 V NaN 2 func-test 9
If you need to handle nan
, prepend replace(np.nan, '')
:
df = df[df['Found'].replace(np.nan, '').str.contains('test|regression')]
And as @sophocles mentioned, you could also make it case-insensitive with case=False
:
df = df[df['Found'].str.contains('test|regression', case=False)]
Upvotes: 1