Reputation: 41
I am trying to access a set of urls present in rows and scrape respective information from all those links and store it in a text file. I have my links stored in a file - "ctp_output.csv" Currently I am able to extract information by directly providing a single link. Require some guidance.
import csv
import urllib2
from bs4 import BeautifulSoup
url = "http://www.thedrum.com/news/2015/07/29/mankind-must-get-ahead-technical-development-states-phds-mark-holden-following"
soup = BeautifulSoup(urllib2.urlopen(url))
with open('ctp_output.txt', 'w') as f:
for tag in soup.find_all('p'):
f.write(tag.text.encode('utf-8') + '\n')
Upvotes: 0
Views: 2156
Reputation: 402263
The next step is to open the csv file and then loop over each line, extracting information for each link. You can do that like this:
import csv
with open('test.csv', 'rb') as f:
reader = csv.reader(f)
for line in reader:
url = line[0] # assuming your url is your first column
.... # scraping code here
Upvotes: 1
Reputation: 936
You can use import the csv in pandas
dataframe using pandas.read_csv()
.
Then iterate through the rows of the dataframe like
for url in data_frame_name.iterrows():
....use the url to get the information like you did in the question.
Upvotes: 0