Reputation: 77
i have a data frame with a column of strings and integers. On one of the columns containing strings I want to search all the items of that column for a specific substring let say "abc" and delete the row if the substring exists. How do I do that? It sounds easy but somehow I struggle with this. The substring is always the last three characters. I tried the following:
df1 = df.drop(df[df.Hostname[-4:]== "abc"])
which gives me
UserWarning: Boolean Series key will be reindexed to match DataFrame index
so I tried to modify the values in that column and filter out all values that do not have "abc" at the end:
red = [c for c in df.Hostname[-4:] if c != 'abc']
which gives me
KeyError('%s not in index' % objarr[mask])
What do I do wrong?
Thanks for your help!
Upvotes: 1
Views: 8528
Reputation: 863611
Use boolean indexing
, add indexing with str
if need check last 4
(3
) chars of column Hostname
and change condition from ==
to !=
:
df1 = df[df.Hostname.str[-4:] != "abc"]
Or maybe:
df1 = df[df.Hostname.str[-3:] != "abc"]
Sample:
df = pd.DataFrame({'Hostname':['k abc','abc','dd'],
'b':[1,2,3],
'c':[4,5,6]})
print (df)
Hostname b c
0 k abc 1 4
1 abc 2 5
2 dd 3 6
df1 = df[df.Hostname.str[-3:] != "abc"]
print (df1)
Hostname b c
2 dd 3 6
Also works str.endswith
if need check last chars:
df1 = df[~df.Hostname.str.endswith("abc")]
print (df1)
Hostname b c
2 dd 3 6
EDIT:
If need check in last 4 chars if abc
and then remove rows first extract values and then use str.contains
:
df1 = df[~df.Hostname.str[-4:].str.contains('abc')]
print (df1)
Hostname b c
2 dd 3 6
EDIT1:
For default index add reset_index
- python counts form 0
, so values of index are 0,1,2,...
:
df1 = df[df.Hostname.str[-3:] != "abc"].reset_index(drop=True)
Upvotes: 1