Get information in sub-tags

Question

I'm trying to retrieve information from a site by web scraping. The information I need is found in sub-tabs, but I'm not able to get it


 
 House
 3
 pièces,                                                                                                         
74 m²
 
 
 New York (11111)
 
 
,
 
 
 Appartement
 3
 pièces,                                                                                                         
64 m²
 
 
 Los Angeles (22222)
 
 
 
 
 House
 4
 pièces,                                                                                                         
81 m²
 
 
 Chicago (33333)

I'm trying to get the ad and the city. I tried:

#BeautifulSoup
from bs4 import BeautifulSoup
import requests

#to get: House 3 pièces, 74 m²
ad = [ad.get_text() for ad in soup.find_all("span", class_='ergov3-txtannonce')]  

#to get cities       
cities = [city.get_text() for city in soup.find_all("cite", class_='ergov3-txtannonce')]

My output:

[]
[]

Good output:

["House 3 pièces, 74 m²", "Appartement 3 pièces, 64 m²", "House 4 pièces, 81 m²"]                                                                                                       
["New York (11111)", "Los Angeles (22222)", "Chicago (33333)"]

HedgeHog · Accepted Answer

Assuming you soup contains the provided HTML select the elements that holds your information and iterate over the ResultSet to scrape the information. avoid multiple lists, try to scrape all information in one go and save it in a more structured way:

...
data = []

for e in soup.select('.ergov3-txtannonce'):
    data.append({
        'title':e.span.get_text(strip=True),
        'city':e.cite.get_text(strip=True)
    })
...

Note: If the elements are not present in your soup, content of website may provided dynamically by JavaScript - This would be predestined for asking a new question with exact this focus

Example

from bs4 import BeautifulSoup

html='''

 
 House 3 pièces, 74 m²
 
 
 New York (11111)
 
 
,
 
 
 Appartement 3 pièces, 64 m²
 
 
 Los Angeles (22222)
 
 
 
 
 House 4 pièces, 81 m²
 
 
 Chicago (33333)
 
 
'''
soup = BeautifulSoup(html)

data = []

for e in soup.select('.ergov3-txtannonce'):
    data.append({
        'title':e.span.get_text(strip=True),
        'city':e.cite.get_text(strip=True)
    })

data

Output

[{'title': 'House 3 pièces, 74 m²', 'city': 'New York (11111)'},
 {'title': 'Appartement 3 pièces, 64 m²', 'city': 'Los Angeles (22222)'},
 {'title': 'House 4 pièces, 81 m²', 'city': 'Chicago (33333)'}]

Get information in sub-tags

Answers (1)

Example

Output

Related Questions