Reputation: 65
I try to drop the duplicate row but I got the the error code: 'Series' object has no attribute 'remove'.
May I know how can I replace the 'remove' command or fix the attributeError?
If the row is duplicate in allMYemail.csv, the row must remove. There is my code:
import csv
import re
import json
import pandas as pd
df1 = pd.read_csv('allMYemail.csv')
df2 = pd.read_csv('MYallmatchagain.csv')
emailSet = set()
for i, row in df1.dropna().iterrows():
emailSet.add(row['0'])
# print(emailSet)
output = []
for i,row in df2.iterrows():
# print(row)
Birthdate = row['Birthdate']
Gender = row['Gender']
Mobile2 = row['Mobile2']
Salutation = row['Salutation']
email = row['email']
firstName = row['firstName']
lastName = row['lastName']
name = row['name']
areaCode = row['areaCode']
errorCode = row['errorCode']
localNumber = row['localNumber']
Status = row['Status']
Domain = row['Domain']
ReturnCode = row['ReturnCode']
matched = False
for emails in emailSet:
if emails == email:
matched = True
break
if matched:
row.remove('Birthdate')
row.remove('Gender')
row.remove('Mobile2')
row.remove('Salutation')
row.remove('email')
row.remove('firstName')
row.remove('lastName')
row.remove('name')
row.remove('areaCode')
row.remove('errorCode')
row.remove('localNumber')
row.remove('Status')
row.remove('Domain')
row.remove('ReturnCode')
else:
pass
output_obj = {}
output_obj['Birthdate'] = Birthdate
output_obj['Gender'] = Gender
output_obj['Mobile2'] = Mobile2
output_obj['Salutation'] = Salutation
output_obj['email'] = email
output_obj['firstName'] = firstName
output_obj['lastName'] = lastName
output_obj['name'] = name
output_obj['areaCode'] = areaCode
output_obj['errorCode'] = errorCode
output_obj['localNumber'] = localNumber
output_obj['Status'] = Status
output_obj['Domain'] = Domain
output_obj['ReturnCode'] = ReturnCode
output.append(output_obj)
df = pd.read_json(json.dumps(output))
# print(json.dumps(output))
df.to_csv(r'MYfinish.csv', index = None)
Any help would be very much appreciated.
Upvotes: 2
Views: 95
Reputation: 85
Since your question is not clear on what it wants to do, If you only want to remove fully duplicate rows in just one df then @Renaud 's solution will do the job. If you want to remove the rows based on the duplicates in a single column 'email' then try this:
def firstline(d):
return(d.reset_index(drop=True).loc[0])
result_df = df.groupby('email').apply(firstline)
Upvotes: 2
Reputation: 2819
Did you try drop_duplicates() from pandas ?
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html
df.drop_duplicates(inplace=True)
Upvotes: 1