Reputation: 71
I have a very basic question: one question about my code below:
#Python code to scrape the shipment URLs
from bs4 import BeautifulSoup
import urllib.request
import urllib.error
import urllib
# read urls of websites from text file > change it to where you stock the file
list_open = open(r"C:\Users\**\data.csv")
#skips the header
read_list = list_open.readlines()[1:]
import os
file_path = os.path.join('c:\\**', 'ShipmentUpdates.txt')
for url in read_list:
soup = BeautifulSoup(urllib.request.urlopen(url).read(), "html5lib")
# parse shipment info
shipment = soup.find_all("span")
Preparation = shipment[0]
Sent = shipment[1]
InTransit = shipment[2]
Delivered = shipment[3]
url = url.strip()
line= f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}"
print (line)
file='c:\\**\ShipmentUpdates.txt'
with open(file, 'w') as filetowrite:
filetowrite.write(line+'\n')
In my output, I have three lines:
http://carmoov.fr/CfQd ; Preparation on 06/01/2022 at 17:45; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 10:31
http://carmoov.fr/CfQs ; Preparation on 06/01/2022 at 15:01; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 11:27
http://carmoov.fr/CfQz ; Preparation on 06/01/2022 at 11:18; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 11:56
But in my text file, it is only one line:
http://carmoov.fr/CfQz ; Preparation on 06/01/2022 at 11:18; Sent on 06/01/2022 at 18:14; InTransit ; Delivered on 07/01/2022 at 11:56
I need the exactly same result of 3 lines in the text. Anything wrong here? Thank you in advance!
Upvotes: 2
Views: 326
Reputation: 11188
The last line of code in your loop keeps re-assigning a value to line
, overwriting (replacing) whatever value it had before. It's only that last value of line
that ends up being written to your file.
I recommend you keep a list of lines
as you step through the loop:
lines = []
for url in read_list:
...
line= f"{url} ; Preparation ..."
lines.append(line)
print (line)
Then, write that list with the writelines()
method of your file.
Despite it's name, writelines()
doesn't add line-endings (to... make a "line" of text), so you have add those yourself, line+'\n'
:
file='c:\\**\ShipmentUpdates.txt'
with open(file, 'w') as filetowrite:
filetowrite.writelines([line+'\n' for line in lines])
Upvotes: 2
Reputation: 21
You are writing to your file after you finish your loop, so, in your case, you write the last line you have stored. Try storing all of your lines
line += f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}" + "\n"
Upvotes: 2
Reputation: 2719
Change this:
line= f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}"
To this:
line += f"{url} ; Preparation {Preparation.getText()}; Sent {Sent.getText()}; InTransit {InTransit.getText()}; Delivered {Delivered.getText()}\n"
You need to concatenate instead of replace
Upvotes: 2