Reputation: 441
I am trying to append a row in sitemap_bp.csv in the adjacent column, if a line contains a string from mobilesitemap-browse.csv. I'm not able to iterate through the lines in mobilesitemap-browse.csv, it gets stuck on the first line. How do I go about solving this?
import csv
with open('sitemap_bp.csv','r') as csvinput:
with open('mobilesitemap-browse.csv','r') as csvinput2:
with open('output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
sitemap = csv.reader(csvinput)
mobilesitemap = csv.reader(csvinput2)
all = []
row = next(sitemap)
row.append('mobile')
all.append(row)
for mobilerow in mobilesitemap:
for row in sitemap:
#print row[0]
if mobilerow[1] in row[0]:
#print row, mobilerow[1]
all.append((row[0], mobilerow[1]))
else:
all.append(row)
writer.writerows(all)
Upvotes: 1
Views: 1106
Reputation: 54213
Personally I'd parse the data from sitemap_bp.csv first, then use that dictionary to populate the new file.
import re
with open('sitemap_bp.csv','r') as csvinput, \
open('mobilesitemap-browse.csv','r') as csvinput2, \
open('output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
sitemap = csvinput # no reason to pipe this through csv.reader
mobilesitemap = csv.reader(csvinput2)
item_number = re.compile(r"\d{5}_\d{7}_{7}")
item_number_mapping = {item_number.search(line).group(): line.strip()
for line in sitemap if item_number.search(line)}
# makes a dictionary {item_number: full_url, ...} for each item in sitemap
# alternate to the above, consider:
# # item_number_mapping = {}
# # for line in sitemap:
# # line = line.strip()
# # match = item_number.search(line)
# # if match:
# # item_number_mapping[match.group()] = match.string
all = [row + [item_number_mapping[row[1]] for row in mobilesitemap]
writer.writerows(all)
My guess is that after the first time through your outer for
loop, it tries to iterate through sitemap
again but can't since the file is already exhausted. The minimal change for that would be:
for mobilerow in mobilesitemap:
csvinput.seek(0) # seek to the start of the file object
next(sitemap) # skip the header row
for row in sitemap:
#print row[0]
if mobilerow[1] in row[0]:
#print row, mobilerow[1]
all.append((row[0], mobilerow[1]))
else:
all.append(row)
But the obvious reason not to do this is that it iterates through your sitemap_bp.csv
file once per row in mobilesitemap-browse.csv
, rather than just once like my code.
If you need to get a list of those URLs in sitemap_bp.csv
that don't correspond with mobilesitemap-browse.csv
, you're probably best-served by making a set
for all the items you see as you see them, then using set operations to get the unseen items. This takes a little tinkering, but...
# instead of all = [row + [item number ...
seen = set()
all = []
for row in mobilesitemap:
item_no = row[1]
if item_no in item_number_mapping:
all.append(row + [item_number_mapping[item_no]])
seen.add(item_no)
# after this for loop, `all` is identical to the list comp version
unmatched_items = [item_number_mapping[item_num] for item_num in
set(item_number_mapping.keys()) - seen]
Upvotes: 1