Reputation: 3965
I have a csv file which contains links to webpages. I'm collecting data from each link and saving it in a separate csv file.
Now in case I have to resume the file from the point where I left it, I have to manually delete the entries from the csv file and then run the code.
I went through documentation for csv module, but couldn't find any function that serves this purpose.
I also went through all other questions on Stackoverflow and other sites regarding this, but none helps.
Is there a way to delete rows the way I want them to?
Here is what I have right now
import pandas as p
df = p.read_csv("All_Links.csv")
for i in df.index:
try:
url= df.ix[i]['MatchLink']
#code process the data in the link
#made sure that processing has finished
#Now need to delete that row
Upvotes: 0
Views: 2717
Reputation: 46636
If you want to write the rest of the data that isn't processed back to the csv file, that is delete only the data that is processed you can just modify your algorithm to:
import pandas as p
df = p.read_csv("All_Links.csv")
for i in df.index:
try:
url= df.ix[i]['MatchLink']
#code process the data in the link
#made sure that processing has finished
df.iloc[i:].to_csv('All_links.csv', index=False)
But this will write your file on every iteration, maybe it's best to remember the value of i
and do it once you finished all the iterations:
import pandas as p
df = p.read_csv("All_Links.csv")
i = 0
for i in df.index:
try:
url= df.ix[i]['MatchLink']
#code process the data in the link
#made sure that processing has finished
except:
# something broke, this row isn't processed decrease i
i -= 1
break
# Now write the rest of unprocessed lines to a csv file
df.iloc[i:].to_csv('All_links.csv', index=False)
Upvotes: 1
Reputation: 41003
Since you are already reading the whole file into the dataframe you can just start iterating from the point you left. Lets say you left on i=23
, you can do:
import pandas as p
df = p.read_csv("All_Links.csv")
last_line_number = 23
for i in df.index[last_line_number:]:
try:
url= df.ix[i]['MatchLink']
#code process the data in the link
#made sure that processing has finished
#Now need to delete that row
This is the simplest way. Something more robust would be to have 2 files, one for lines to be processed and one for processed lines.
Upvotes: 1