Reputation: 13
import requests
from bs4 import BeautifulSoup
webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
for story_heading in soup.find_all(class_="story-heading"):
articles = story_heading.text.replace('\n', '').replace(' ', '')
print (articles)
There is my code, it prints out a list of all the article titles on the website. I get strings:
Looking Back: 1980 | Funny, but Not Fit to Print
Brooklyn Studio With Room for Family and a Dog
Search for Homes for Sale or Rent
Sell Your Home
So, I want to convert this to a list = ['Search for Homes for Sale or Rent', 'Sell Your Home', ...], witch will allow me to make some other manipulations like random.choice etc.
I tried:
alist = articles.split("\n")
print (alist)
['Looking Back: 1980 | Funny, but Not Fit to Print']
['Brooklyn Studio With Room for Family and a Dog']
['Search for Homes for Sale or Rent']
['Sell Your Home']
It is not a list that I need. I'm stuck. Can you please help me with this part of code.
Upvotes: 0
Views: 61
Reputation: 26580
You are constantly overwriting articles
with the next value in your list. What you want to do instead is make articles
a list, and just append
in each iteration:
import requests
from bs4 import BeautifulSoup
webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
articles = []
for story_heading in soup.find_all(class_="story-heading"):
articles.append(story_heading.text.replace('\n', '').replace(' ', ''))
print (articles)
The output is huge, so this is a small sample of what it looks like:
['Global Deal Reached to Curb Chemical That Warms Planet', 'Accord Could Push A/C Out of Sweltering India’s Reach ',....]
Furthermore, you only need to strip spaces in each iteration. You don't need to do those replacements. So, you can do this with your story_heading.text
instead:
articles.append(story_heading.text.strip())
Which, can now give you a final solution looking like this:
import requests
from bs4 import BeautifulSoup
webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
articles = [story_heading.text.strip() for story_heading in soup.find_all(class_="story-heading")]
print (articles)
Upvotes: 2