moski
moski

Reputation: 79

BeautifulSoup elements output to list

I have an output using BeautifulSoup.

  1. I need to convert the output from 'type' 'bs4.element.Tag' to a list and export the list into a DataFrame column, named COLUMN_A

  2. I want my output to stop at the 14th element (the last three h2 are useless)

My code:

import requests
from bs4 import BeautifulSoup


url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
attraction_place=soup.find_all('h2', class_="sitename")    

for attraction in attraction_place:
    print(attraction.text)
    type(attraction)

Output:

1  Vigeland Sculpture Park
2  Akershus Fortress
3  Viking Ship Museum
4  The National Museum
5  Munch Museum
6  Royal Palace
7  The Museum of Cultural History
8  Fram Museum
9  Holmenkollen Ski Jump and Museum
10  Oslo Cathedral
11  City Hall (Rådhuset)
12  Aker Brygge
13  Natural History Museum & Botanical Gardens
14  Oslo Opera House and Annual Music Festivals
Where to Stay in Oslo for Sightseeing
Tips and Tours: How to Make the Most of Your Visit to Oslo
More Related Articles on PlanetWare.com

I expect a list like:

attraction=[Vigeland Sculpture Park, Akershus Fortress, ......]

Thank you very much in advance.

Upvotes: 1

Views: 1858

Answers (3)

QHarr
QHarr

Reputation: 84455

A nice easy way is to take the alt attribute of the photos. This gets clean text output and only 14 without any need for slicing/indexing.

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm')
soup = bs(r.content, 'lxml')
attractions = [item['alt'] for item in soup.select('.photo [alt]')]
print(attractions)

Upvotes: 1

Ourik gruzdev
Ourik gruzdev

Reputation: 126

You can use slice.

for attraction in attraction_place[:14]:
    print(attraction.text)
    type(attraction)

Upvotes: 1

Robert Kearns
Robert Kearns

Reputation: 1706

new = []
count = 1
for attraction in attraction_place:
    while count < 15:
        text = attraction.text
        new.append(text)
        count += 1

Upvotes: 2

Related Questions