Reputation: 665
I have a db.sql file that includes lots of urls like as follows.
....<td class=\"column-1\"><a href=\"http://geni.us/4Lk5\" rel=nofollow\"><img src=\"http://www.toprateten.com/wp-content/uploads/2016/08/25460A-Panini-Press-Gourmet-Sandwich-Maker.jpg \" alt=\"25460A Panini Press Gourmet Sandwich Maker\" height=\"100\" width=\"100\"></a></td><td class=\"column-2\"><a href=\"http://geni.us/4Lk5\" rel=\"nofollow\">25460A Panini Press Gourmet Sandwich Maker</a></td><td class....
As you can see, there is http://geni.us/4Lk5\ in the file.
I have another product.csv files that contains ID (like 4LK5 above) and Amazon product URL like as follows.
4Lk5 8738 8/16/2016 0:20 https://www.amazon.com/gp/product/B00IWOJRSM/ref=as_li_qf_sp_asin_il_tl?ie=UTF8
Jx9Aj2 8738 8/22/2016 20:16 https://www.amazon.com/gp/product/B007EUSL5U/ref=as_li_qf_sp_asin_il_tl?ie=UTF8
9sl2 8738 8/22/2016 20:18 https://www.amazon.com/gp/product/B00C3GQGVG/ref=as_li_qf_sp_asin_il_tl?ie=UTF8
As you can see, there is 4LK5 which matches with Amazon product URL.
I have already read the csv file and pick only ID and Amazon product url with python.
def openFile(filename, mode):
index = 0
result = []
with open(filename, mode) as csvfile:
spamreader = csv.reader(csvfile, delimiter = ',', quotechar = '\n')
for row in spamreader:
result.append({
"genu_id": row[0],
"amazon_url": row[3]
});
return result
I have to add some code to search appropriate URL with genu_id in the db.sql and replace with amazon_url described on the code above.
Please help me.
Upvotes: 0
Views: 65
Reputation: 25799
There is no need for regex if you have such a predefined structure - if all links are in the form of http://geni.us/<geni_id>
you can do it with simple str.replace()
by reading each row of your CSV and replacing the matches in your SQL file. Something like:
import csv
with open("product.csv", "rb") as source, open("db.sql", "r+") as target: # open the files
sql_contents = target.read() # read the SQL file contents
reader = csv.reader(source, delimiter="\t") # build a CSV reader, tab as a delimiter
for row in reader: # read the CSV line by line
# replace any match of http://geni.us/<first_column> with third column's value
sql_contents = sql_contents.replace("http://geni.us/{}".format(row[0]), row[3])
target.seek(0) # seek back to the start of your SQL file
target.truncate() # truncate the rest
target.write(sql_contents) # write back the changed content
# ...
# Profit? :D
Of course, if your original CSV file is comma-delimited, replace the delimiter in the csv.reader()
call - the one you presented here seems tab-delimited.
Upvotes: 1