retro_coder
retro_coder

Reputation: 456

How to read a CSV file that contains no headers using Pandas, capture data in the first column only and perform deletion?

I have a CSV file that contains information about people and all sorts of data that takes up more than 100 columns. There are no headers and my main intention is to grab the peoples' names only. Not the other data related to it. How can I do this?

This is my CSV file --- 'data.csv':

John   12 34 23 48 14 44 94 24  ...    #extends till 100
Becky  23 40 93 47 84 43 64 31  ...    #extends till 100
Lio    63 90 53 77 14 12 69 20  ...    #extends till 100

Following this, let's say I have a list in my code that is populated with lots of names:

names = ['Timothy', 'Joshua', 'Rio', 'Catherine', 'Poorva', 'Gome', 'Lachlan', 'John', 'Lio']

I opened my CSV file in Python and used list comprehension in order to read all the names in the first column and store them in a list with a variable 'people_list' assigned.

Now, for all elements in the people_list, if the element is not seen in the 'names' list, I want to delete that element in my CSV file. In this example, I want to delete Becky since she does not appear in the names list. This is what I tried so far...

Demo -- data.py:

names = ['Timothy', 'Joshua', 'Rio', 'Catherine', 'Poorva', 'Gome', 'Lachlan', 'John', 'Lio']
csv_filename = data.csv

with open(csv_filename, 'r') as readfile:
reader = csv.reader(readfile, delimiter=',') 
people_list = [row[0] for row in reader]

for person in people_list:
    if person not in names:
        id = people_list.index(person) #grab the index of the person in people_list who's not found in the names list.

        #using pandas
        df = pd.read_csv(csv_filename) #read data.csv file
        df.drop(df.index[id], in_place = True) #delete the row id for the person who does not exist in names list.
        df.to_csv(csv_filename, index = False, sep=',')  #close the csv file with no index
    else:
        print("This person is found in the names list")

Instead of deleting Becky, all the records in my CSV file were deleted (including Becky). Can someone explain how to do this?

Upvotes: 2

Views: 1153

Answers (1)

jezrael
jezrael

Reputation: 862661

Add parameter header=None to read_csv for default columns 0,1,2...:

df = pd.read_csv(csv_filename,  header=None)

names = ['Timothy', 'Joshua', 'Rio', 'Catherine', 'Poorva', 'Gome', 'Lachlan', 'John', 'Lio']

Then select first column by df[0] and test membership by Series.isin and filter by boolean indexing:

df = df[df[0].isin(names)]
print (df)

Last write to file:

df.to_csv(csv_filename1, index = False, header=None)

Upvotes: 1

Related Questions