Reputation: 177

How to find duplicates in a csv with python, and then alter the row

For a little background this is the csv file that I'm starting with. (the data is nonsensical and only used for proof of concept)

Jackson,Thompson,[email protected],test,
Luke,Wallace,[email protected],test,
David,Wright,[email protected],test,
Nathaniel,Butler,[email protected],test,
Eli,Simpson,[email protected],test,
Eli,Mitchell,[email protected],,test2
Bob,Test,[email protected],test,

What I am attempting to do with this csv on a larger scale is if the first value in the row is duplicated I need to take the data in the second entry and append it to the row with the first instance of the value. For example, in the data above "Eli" is represented twice, the first instance has "test" after the email value. The second instance of "Eli" does not have a value there it instead has another value in the next index over, and remove the duplicate row.

I would want it to go from this:
Eli,Simpson,[email protected],test,,
Eli,Mitchell,[email protected],,test2

To this:
Eli,Simpson,[email protected],test,test2

I have been able to successfully import this csv into my code using what is below.

import csv

f = open('C:\Projects\Python\Test.csv','r')
csv_f = csv.reader(f)

test_list = []

for row in csv_f:
   test_list.append(row[0])
   print(test_list)

At this point I was able to import my csv, and put the first names into my list. I'm not sure how to compare the indexes to make the changes I'm looking for. I'm a python rookie so any help/guidance would be greatly appreciated.

Upvotes: 0

Answers (2)

Cloasis

Reputation: 1

I am a kind of a newbie in python as well but I would suggest using dictreader and look at the excel file as a dictionary meaning every raw is a dictionary. this way you can iterate through the names easily. Second, I would suggest making a list of names already known to you as you iterate through the excel file to check if this is a known name for example

name_list.append("eli")

then when you check if "eli" in name_list: and add a key, value to the first one.

I don't know if this is best practice so don't roast me guys, but this is a simple and quick solution.

This will help you practice iterating through lists and dictionaries as well.

Here is a helpful link for reading about csv handling.

Upvotes: 0

Richard Ewing

Reputation: 88

If you want to use pandas you could use the pandas .drop_deplicates() method. An example would look something like this.

import pandas as pd

csv_f =  pd.read_csv(r'C:\a file with addresses')
data.drop_duplicates(subset=['thing_to_drop'], keep='first',inplace=False)

see pandas documentation https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=2ahUKEwiej-eNrLrjAhVBGs0KHV6bB9kQFjADegQIABAB&url=https%3A%2F%2Fpandas.pydata.org%2Fpandas-docs%2Fstable%2Freference%2Fapi%2Fpandas.DataFrame.drop_duplicates.html&usg=AOvVaw1uGhCrPNMDDZAZWE9_YA9D

Upvotes: 1

How to find duplicates in a csv with python, and then alter the row

Answers (2)

Related Questions