hikabikabu
hikabikabu

Reputation: 33

Python: Changing bs4.element.ResultSet elements in list of lists to text

Hi everyone I have extracted some html elements from a webiste using beautifulsoup and find_all. Therefore I have received a list of list of bs4.elements.ResultSet like this:

[[<li class="WlSsj w9uVi">neu</li>],
 [<li class="WlSsj w9uVi">neu</li>],
 [<li class="WlSsj w9uVi">neu</li>, <li class="WlSsj">Terrasse</li>],
 [<li class="WlSsj w9uVi">neu</li>,
  <li class="WlSsj">Terrasse</li>,
  <li class="WlSsj">Parkplatz</li>]

I would now like to retrieve the text within the bs4 elements and keep the same format of list. I have been experimenting with creating two loops.

fet = []
for feat in features_bs:
    for fets in feat:
        fet.append(fets.text)
    features.append(fet)

The first loop looks at every list (feat) within the original list (features_bs). The second looks at every elements (fets) in every inside list (feats) and then changes the element to text. I would now have liked to append the text back into an empty list (fet), however I would like to keep the same format as before with lists inside lists. At the moment I only get the text inside the first loop like this:

['neu',
 'neu',
 'neu',
'Terrasse',
 'neu',
'Terrasse',
 'Parkplatz']

However I would like the output to be:

[['neu'],
['neu'],
['neu','Terrase'],
['neu'],
['Terrase']
['Parkplatz']]

Thanks for the help in advance.

Upvotes: 0

Views: 440

Answers (1)

HedgeHog
HedgeHog

Reputation: 25073

Near to your goal - but there is one temporary list missing:

fet = []
for feat in features_bs:
    el = []
    for fets in feat:
        el.append(fets.text)
    fet.append(el)
fet

Output:

[['neu'], ['neu'], ['neu', 'Terrasse'], ['neu'], ['Terrasse'], ['Parkplatz']]

You could also lean your process and transform it directly into your expected format:

from bs4 import BeautifulSoup

html = '''
<ul>
<li class="WlSsj w9uVi">neu</li>
</ul>
<ul>
<li class="WlSsj w9uVi">neu</li>
</ul>
<ul>
<li class="WlSsj w9uVi">neu</li>, <li class="WlSsj">Terrasse</li>
</ul>
<ul>
<li class="WlSsj w9uVi">neu</li>
</ul>
<ul>
<li class="WlSsj">Terrasse</li>
</ul>
<ul>
<li class="WlSsj">Parkplatz</li>
</ul>
'''

soup = BeautifulSoup(html)
data = []
for ul in soup.find_all('ul'):
    el = []
    for e in ul.find_all('li'):
        el.append(e)
    data.append(el)
data

Output:

[['neu'], ['neu'], ['neu', 'Terrasse'], ['neu'], ['Terrasse'], ['Parkplatz']]

Upvotes: 1

Related Questions