Vignesh
Vignesh

Reputation: 25

Count the duplicate rows in excel using python and i am getting error TypeError: a bytes-like object is required, not 'str'

Format Like this email excel file

name       email
A          [email protected]
B          B@gmailcom
C          [email protected]
A          [email protected]
B          [email protected]

In second excel file outfile.csv This is the output

name       email               count
    A          [email protected]         2
    B          B@gmailcom          2
    C          [email protected]          1

This is python code First, I read the excel file

data_file=pd.read_excel('email.xlsx')
writer = csv.writer(open('outfiles.csv','wb'))
code = defaultdict(int)
for row in data_file:
    code[row[0]] += 1
# now write the file
for row in code.items():
   writer.writerow(row)

Error:

writer.writerow(row) TypeError: a bytes-like object is required, not 'str'

I am getting this error so could you please help me out.

Upvotes: 1

Views: 1352

Answers (1)

Guinther Kovalski
Guinther Kovalski

Reputation: 1919

If you just want to count the duplicates, use pandas.DataFrame.unique()!

import pandas as pd
data = pd.read_excel('email.xlsx') 
unique = data.column_name.unique() 
duplicates = len(data)-len(unique)
print("number of duplicate rows is:",duplicates)

you just need to know the column_name, you can see all using print(data.columns)

Upvotes: 2

Related Questions