Reputation: 123
import pandas as pd
import csv
def load_source(filename):
users = pd.read_csv(filename, encoding="utf8")
return users
list_me = "Entrepreneur|Behold|=|Ã|±|Ã|®|Å|¥|ð|Ÿ|˜|‡|ð|à|¤|œ|à|¤|²"
users = load_source(latest_file)
filtered_followers_up = users[users.followersCount <= 1500]
filtered_followers_down = filtered_followers_up[filtered_followers_up.followersCount >= 0]
filtered_bio = filtered_followers_down[filtered_followers_down['bio'].dropna().str.contains(list_me)]
filtered_bio.to_csv(r'C:\Users\user\Downloads\test.csv', sep=',', encoding='utf-8')
print("Done!")
So what I'm trying to do is filtering my csv file by removing all rows that contains ("Entrepreneur|Behold|=|Ã|±|Ã|®|Å|¥|ð|Ÿ|˜|‡|ð|à|¤|œ|à|¤|²")
Upvotes: 1
Views: 58
Reputation: 13478
The issue comes from the fact that you are calling dropna()
while filtering the dataframe.
Instead, remove NA values first and use bitwise not operator ~
to remove all rows matching with list_me:
# Example dataframe
filtered_followers_down = pd.DataFrame({"bio": ["a", "Behold", pd.NA, "d", "Ã"]})
filtered_followers_down = filtered_followers_down.dropna()
filtered_bio = filtered_followers_down[
~filtered_followers_down["bio"].str.contains(list_me)
]
print(filtered_bio)
# Output
bio
0 a
3 d
Upvotes: 1