Reputation: 10443
I have a pandas data frame:
df_total_data2
which have the following columns:
df_total_data.columns
Index([u'BBBlink', u'Name', u'_type'], dtype='object')
I want to drop the all the rows which don't satisfy a given condition, in this case the condition is that the column can't contain the word secure
I want to drop the row at the place, not to have a function that return None
if the condition isn't meet.
So I write this function:
df_total_data.apply(lambda x: 'secure' not in x['BBBlink'],1 ).values
Which return a boolean array, but I don't know how to used it to drop the row.
Edit:
I got an array:
array([ True, True, True, True, True,False....True])
Now, how can I use this array to drop the columns?
Upvotes: 1
Views: 96
Reputation: 863741
IIUC you can use isin:
print df_total_data
BBBlink Name _type
0 secure name A
1 secure name A
2 secure name A
3 secure name A
4 secure name A
5 sre name A
print df_total_data.BBBlink.isin(['secure'])
0 True
1 True
2 True
3 True
4 True
5 False
Name: BBBlink, dtype: bool
print df_total_data[df_total_data.BBBlink.isin(['secure'])]
BBBlink Name _type
0 secure name A
1 secure name A
2 secure name A
3 secure name A
4 secure name A
print df_total_data[~df_total_data.BBBlink.isin(['secure'])]
BBBlink Name _type
5 sre name A
But if string is with other strings you can use str.contains:
print df_total_data
BBBlink Name _type
0 secure qq name A
1 secure name A
2 secure name A
3 secure name A
4 secure aa ss name A
5 sre name A
print df_total_data[~df_total_data.BBBlink.str.contains('secure')]
BBBlink Name _type
5 sre name A
Upvotes: 1
Reputation: 81684
Once you got a boolean array you can select only the rows where it is True
by doing df[boolean_array]
or only the rows where it is False
by adding ~
, df[~boolean_array]
.
As for your question, you can either use the drop
method or do it yourself:
df_total_data[df_total_data.apply(lambda x: 'secure' not in x['BBBlink'],1 ).values]
Just remember that this is not inplace so you need to either assign the returned value to a new dataframe or re-assign it to the existing one.
By the way, you can simplify your condition a bit:
df_total_data[df_total_data['BBBlink'].apply(lambda x: 'secure' not in x)]
Upvotes: 1