Oyabi
Oyabi

Reputation: 914

Panda compare and remove data from csv and xls

I have 2 file (a .csv and a .xls). The .csv have only one column (e-mail). The .xls have many columns. I try to compare email columns in these two files and remove from .xls mail address not in .csv. The mails address are not sort.

I have write some code but I do not achieve my goal :

excel = pd.read_excel(file, skiprow=10, parse_cols = 'AL')
csv = pd.read_csv(namelist_file)
excel_keep = excel[excel.isin(csv)]
mask = excel.isin(csv.tolist())
excel[~mask]
print(excel_keep)

Have you an idea please ? Regards.

Upvotes: 0

Views: 451

Answers (1)

epattaro
epattaro

Reputation: 2438

df_csv = pd.read_csv(path_to_csv)
df_xlsx = pd.read_excel(path_to_excel)

## assuming column header for email in both files is 'email'
## if not change it by df = df.rename(columns={'oldName': 'email'})

df_xlsx = df_xlsx[df_xlsx['email'].isin(df_csv['email'])]

hope that helps

Upvotes: 1

Related Questions