python pandas delete row on string condition

Question

i have a data frame with a column of strings and integers. On one of the columns containing strings I want to search all the items of that column for a specific substring let say "abc" and delete the row if the substring exists. How do I do that? It sounds easy but somehow I struggle with this. The substring is always the last three characters. I tried the following:

df1 = df.drop(df[df.Hostname[-4:]== "abc"])

which gives me

UserWarning: Boolean Series key will be reindexed to match DataFrame index

so I tried to modify the values in that column and filter out all values that do not have "abc" at the end:

red = [c for c in df.Hostname[-4:] if c != 'abc']

which gives me

KeyError('%s not in index' % objarr[mask])

What do I do wrong?

Thanks for your help!

jezrael · Accepted Answer

Use boolean indexing, add indexing with str if need check last 4 (3) chars of column Hostname and change condition from == to !=:

df1 = df[df.Hostname.str[-4:] != "abc"]

Or maybe:

df1 = df[df.Hostname.str[-3:] != "abc"]

Sample:

df = pd.DataFrame({'Hostname':['k abc','abc','dd'],
                  'b':[1,2,3],
                  'c':[4,5,6]})
print (df)
  Hostname  b  c
0    k abc  1  4
1      abc  2  5
2       dd  3  6

df1 = df[df.Hostname.str[-3:] != "abc"]
print (df1)
  Hostname  b  c
2       dd  3  6

Also works str.endswith if need check last chars:

df1 = df[~df.Hostname.str.endswith("abc")]
print (df1)
  Hostname  b  c
2       dd  3  6

EDIT:

If need check in last 4 chars if abc and then remove rows first extract values and then use str.contains:

df1 = df[~df.Hostname.str[-4:].str.contains('abc')]
print (df1)
  Hostname  b  c
2       dd  3  6

EDIT1:

For default index add reset_index - python counts form 0, so values of index are 0,1,2,...:

df1 = df[df.Hostname.str[-3:] != "abc"].reset_index(drop=True)

python pandas delete row on string condition

Answers (1)

Related Questions