Jason
Jason

Reputation: 65

remove all row if the row is duplicate in python

I try to drop the duplicate row but I got the the error code: 'Series' object has no attribute 'remove'.

May I know how can I replace the 'remove' command or fix the attributeError?

If the row is duplicate in allMYemail.csv, the row must remove. There is my code:

import csv
import re
import json
import pandas as pd

df1 = pd.read_csv('allMYemail.csv')
df2 = pd.read_csv('MYallmatchagain.csv')

emailSet = set()
for i, row in df1.dropna().iterrows():
    emailSet.add(row['0'])
# print(emailSet)
output = []
for i,row in df2.iterrows():
    # print(row)
    Birthdate = row['Birthdate']
    Gender = row['Gender']
    Mobile2 = row['Mobile2']
    Salutation = row['Salutation']
    email = row['email']
    firstName = row['firstName']
    lastName = row['lastName']
    name = row['name']
    areaCode = row['areaCode']
    errorCode = row['errorCode']
    localNumber = row['localNumber']
    Status = row['Status']
    Domain = row['Domain']
    ReturnCode = row['ReturnCode']
    matched = False
    for emails in emailSet:
        if emails == email:
            matched = True
            break
    if matched:
        row.remove('Birthdate')
        row.remove('Gender')
        row.remove('Mobile2')
        row.remove('Salutation')
        row.remove('email')
        row.remove('firstName')
        row.remove('lastName')
        row.remove('name')
        row.remove('areaCode')
        row.remove('errorCode')
        row.remove('localNumber')
        row.remove('Status')
        row.remove('Domain')
        row.remove('ReturnCode')
    else:
        pass
    output_obj = {}
    output_obj['Birthdate'] = Birthdate 
    output_obj['Gender'] = Gender
    output_obj['Mobile2'] = Mobile2 
    output_obj['Salutation'] = Salutation 
    output_obj['email'] = email 
    output_obj['firstName'] = firstName 
    output_obj['lastName'] = lastName 
    output_obj['name'] = name
    output_obj['areaCode'] = areaCode
    output_obj['errorCode'] = errorCode 
    output_obj['localNumber'] = localNumber 
    output_obj['Status'] = Status 
    output_obj['Domain'] = Domain
    output_obj['ReturnCode'] = ReturnCode 
    output.append(output_obj)
df = pd.read_json(json.dumps(output))
# print(json.dumps(output))
df.to_csv(r'MYfinish.csv', index = None)

Any help would be very much appreciated.

Upvotes: 2

Views: 95

Answers (2)

TwerkingPanda
TwerkingPanda

Reputation: 85

Since your question is not clear on what it wants to do, If you only want to remove fully duplicate rows in just one df then @Renaud 's solution will do the job. If you want to remove the rows based on the duplicates in a single column 'email' then try this:

def firstline(d):
   return(d.reset_index(drop=True).loc[0])

result_df = df.groupby('email').apply(firstline)

Upvotes: 2

Renaud
Renaud

Reputation: 2819

Did you try drop_duplicates() from pandas ?

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html

df.drop_duplicates(inplace=True)

Upvotes: 1

Related Questions