Reputation: 914
I have 2 file (a .csv and a .xls). The .csv have only one column (e-mail). The .xls have many columns. I try to compare email columns in these two files and remove from .xls mail address not in .csv. The mails address are not sort.
I have write some code but I do not achieve my goal :
excel = pd.read_excel(file, skiprow=10, parse_cols = 'AL')
csv = pd.read_csv(namelist_file)
excel_keep = excel[excel.isin(csv)]
mask = excel.isin(csv.tolist())
excel[~mask]
print(excel_keep)
Have you an idea please ? Regards.
Upvotes: 0
Views: 451
Reputation: 2438
df_csv = pd.read_csv(path_to_csv)
df_xlsx = pd.read_excel(path_to_excel)
## assuming column header for email in both files is 'email'
## if not change it by df = df.rename(columns={'oldName': 'email'})
df_xlsx = df_xlsx[df_xlsx['email'].isin(df_csv['email'])]
hope that helps
Upvotes: 1