Reputation: 404
Suppose I have a following Dataframe:
ter_id shstr value
6 2018002000000 201 1740.0
7 2018002000000 201 10759.0
8 2018002000002 201 2.0
How do I can filter out rows with last six symbols of ter_id
is zeroes? That is desired output is:
ter_id shstr value
8 2018002000002 201 2.0
I made a boolean function
def is_total(ter_id: str) -> bool:
if ter_id[:-6] == "000000":
return True
return False
But it usage fail with error:
dataset.filter(is_total(dataset.ter_id))
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Pandas version is 1.0.1
Upvotes: 0
Views: 2725
Reputation: 148870
No need for a Python function, you can just use:
dataset[dataset['ter_id'].str.slice(-6) != '000000']
Upvotes: 0
Reputation: 862406
Change indexing for last 6
values by [-6:]
and get all non matched rows by boolean indexing
:
df = dataset[dataset.ter_id.str[-6:] != "000000"]
print (df)
ter_id shstr value
8 2018002000002 201 2.0
Upvotes: 1
Reputation: 1786
Well, what comes to my mind is that you should first convert the column (ter_id) to string. Then use .contains method on the whole column
df_filtered = df[~df.ter_id.str.contains("000000")].copy()
df
is your dataframe name. I used copy()
function to surpress warnings. Let me know if this helps....
P.S. You can put any string instead of zeros.
Upvotes: 0
Reputation: 1230
For filtering a dataframe based on column values, there is rarely a reason to write your own function. You can pass the conditions as a boolean mask into df.loc[] (assuming your DataFrame is named df).
df = df.loc[df["ter_id"].str[-6:] != "000000"]
Upvotes: 3
Reputation: 323226
IIUC
df[~(df.ter_id%1000000==0)]
Out[256]:
ter_id shstr value
8 2018002000002 201 2.0
Upvotes: 0