md1hunox
md1hunox

Reputation: 3965

Deleting row from CSV using python

I have a csv file which contains links to webpages. I'm collecting data from each link and saving it in a separate csv file.
Now in case I have to resume the file from the point where I left it, I have to manually delete the entries from the csv file and then run the code.
I went through documentation for csv module, but couldn't find any function that serves this purpose.
I also went through all other questions on Stackoverflow and other sites regarding this, but none helps.
Is there a way to delete rows the way I want them to?

Here is what I have right now

import pandas as p

df = p.read_csv("All_Links.csv")

for i in df.index:
    try:
        url= df.ix[i]['MatchLink']

        #code process the data in the link

        #made sure that processing has finished
        #Now need to delete that row

Upvotes: 0

Views: 2717

Answers (2)

Viktor Kerkez
Viktor Kerkez

Reputation: 46636

If you want to write the rest of the data that isn't processed back to the csv file, that is delete only the data that is processed you can just modify your algorithm to:

import pandas as p

df = p.read_csv("All_Links.csv")

for i in df.index:
    try:
        url= df.ix[i]['MatchLink']
        #code process the data in the link
        #made sure that processing has finished
        df.iloc[i:].to_csv('All_links.csv', index=False)

But this will write your file on every iteration, maybe it's best to remember the value of i and do it once you finished all the iterations:

import pandas as p

df = p.read_csv("All_Links.csv")

i = 0
for i in df.index:
    try:
        url= df.ix[i]['MatchLink']
        #code process the data in the link
        #made sure that processing has finished
    except:
        # something broke, this row isn't processed decrease i
        i -= 1
        break

# Now write the rest of unprocessed lines to a csv file
df.iloc[i:].to_csv('All_links.csv', index=False)

Upvotes: 1

elyase
elyase

Reputation: 41003

Since you are already reading the whole file into the dataframe you can just start iterating from the point you left. Lets say you left on i=23, you can do:

import pandas as p

df = p.read_csv("All_Links.csv")

last_line_number = 23
for i in df.index[last_line_number:]:
    try:
        url= df.ix[i]['MatchLink']
        #code process the data in the link
        #made sure that processing has finished
        #Now need to delete that row

This is the simplest way. Something more robust would be to have 2 files, one for lines to be processed and one for processed lines.

Upvotes: 1

Related Questions