DarKsi
DarKsi

Reputation: 13

Python - converting to list

import requests
from bs4 import BeautifulSoup 

webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
for story_heading in soup.find_all(class_="story-heading"): 
articles = story_heading.text.replace('\n', '').replace('  ', '')
print (articles)

There is my code, it prints out a list of all the article titles on the website. I get strings:

Looking Back: 1980 | Funny, but Not Fit to Print

Brooklyn Studio With Room for Family and a Dog

Search for Homes for Sale or Rent

Sell Your Home

So, I want to convert this to a list = ['Search for Homes for Sale or Rent', 'Sell Your Home', ...], witch will allow me to make some other manipulations like random.choice etc.
I tried:

alist = articles.split("\n")
print (alist)

['Looking Back: 1980 | Funny, but Not Fit to Print']

['Brooklyn Studio With Room for Family and a Dog']

['Search for Homes for Sale or Rent']

['Sell Your Home']

It is not a list that I need. I'm stuck. Can you please help me with this part of code.

Upvotes: 0

Views: 61

Answers (1)

idjaw
idjaw

Reputation: 26580

You are constantly overwriting articles with the next value in your list. What you want to do instead is make articles a list, and just append in each iteration:

import requests
from bs4 import BeautifulSoup 

webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
articles = []
for story_heading in soup.find_all(class_="story-heading"): 
    articles.append(story_heading.text.replace('\n', '').replace('  ', ''))
print (articles)

The output is huge, so this is a small sample of what it looks like:

['Global Deal Reached to Curb Chemical That Warms Planet', 'Accord Could Push A/C Out of Sweltering India’s Reach ',....] 

Furthermore, you only need to strip spaces in each iteration. You don't need to do those replacements. So, you can do this with your story_heading.text instead:

articles.append(story_heading.text.strip())

Which, can now give you a final solution looking like this:

import requests
from bs4 import BeautifulSoup 

webpage = requests.get("http://www.nytimes.com/")
soup = BeautifulSoup(requests.get("http://www.nytimes.com/").text, "html.parser")
articles = [story_heading.text.strip() for story_heading in soup.find_all(class_="story-heading")]
print (articles)

Upvotes: 2

Related Questions