Reputation: 25
Format Like this email excel file
name email
A [email protected]
B B@gmailcom
C [email protected]
A [email protected]
B [email protected]
In second excel file outfile.csv This is the output
name email count
A [email protected] 2
B B@gmailcom 2
C [email protected] 1
This is python code First, I read the excel file
data_file=pd.read_excel('email.xlsx')
writer = csv.writer(open('outfiles.csv','wb'))
code = defaultdict(int)
for row in data_file:
code[row[0]] += 1
# now write the file
for row in code.items():
writer.writerow(row)
Error:
writer.writerow(row) TypeError: a bytes-like object is required, not 'str'
I am getting this error so could you please help me out.
Upvotes: 1
Views: 1352
Reputation: 1919
If you just want to count the duplicates, use pandas.DataFrame.unique()!
import pandas as pd
data = pd.read_excel('email.xlsx')
unique = data.column_name.unique()
duplicates = len(data)-len(unique)
print("number of duplicate rows is:",duplicates)
you just need to know the column_name, you can see all using print(data.columns)
Upvotes: 2