Sai
Sai

Reputation: 87

Trying to delete an row from pandas Dataframe based on some conditon on a column

I'm a working professional doing my Mtech and trying to do a project on machine learning. I;m new to python as well as ML. I have a column called as Found and this has multiple values. I want to delete all the rows which is not matching a specific condition mentioned based on found column

    df['Found']

0          developement
1          func-test
2          func-test
3         regression
4          func-test
5        integration
6          func-test
7          func-test
8         regression
9          func-test

I want to keep the rows which has Found value as "anything that has test and regression

I wrote the following code.

remove_list = []
for x in range(df.shape[0]):
    text = df.iloc[x]['Found']
    if not re.search('test|regression', text, re.I):
        remove_list.append(x)
print(remove_list) 
df.drop(remove_list, inplace = True)
print(df)

but the remove_list is empty. am i doing anything wrong here? or is there a better way of achieving this?

[]
      Identifier Status  Priority  Severity         Found       Age  \
0     Bug 1      V       NaN         2         development        1   
1     Bug 2      R       NaN         6         func-test         203   
2     Bug 3      V       NaN         2         func-test          9   
3     Bug 4      D       NaN         3        regression          4   
4     Bug 5      V       NaN         2        func-test           9   

I even tried this but i get the following error:

for x in range(df.shape[0]):
    if not re.search('test|regression|customer', df.iloc[x]['Found'], re.I):
        df.drop(x, inplace = True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-77-14f97ad6d00a> in <module>
      1 for x in range(df.shape[0]):
----> 2     if not re.search('test|regression|customer', df.iloc[x]['Found'], re.I):
      3         df.drop(x, inplace = True)

~/Desktop/Anaconda/anaconda3/envs/nlp_course/lib/python3.7/re.py in search(pattern, string, flags)
    183     """Scan through string looking for a match to the pattern, returning
    184     a Match object, or None if no match was found."""
--> 185     return _compile(pattern, flags).search(string)
    186 
    187 def sub(pattern, repl, string, count=0, flags=0):

TypeError: expected string or bytes-like object

Upvotes: 1

Views: 158

Answers (1)

tdy
tdy

Reputation: 41327

You can do this concisely with .str.contains() and boolean indexing:

df = df[df['Found'].str.contains('test|regression')]

#   Identifier Status  Priority  Severity       Found  Age
# 1      Bug 2      R       NaN         6   func-test  203
# 2      Bug 3      V       NaN         2   func-test    9
# 3      Bug 4      D       NaN         3  regression    4
# 4      Bug 5      V       NaN         2   func-test    9

If you need to handle nan, prepend replace(np.nan, ''):

df = df[df['Found'].replace(np.nan, '').str.contains('test|regression')]

And as @sophocles mentioned, you could also make it case-insensitive with case=False:

df = df[df['Found'].str.contains('test|regression', case=False)]

Upvotes: 1

Related Questions