mayord
mayord

Reputation: 19

Convert beautiful list output to string

I am scraping a site for some stats and getting the results as expected, but I can't get the final list output into a string. Searched and tried everything I can find... strip(), append(), replace('\n'), replace('\n\t\r'), and a few dozen other things. And, I get an output error at the end as there are some additional items in list I don't want.

Output I get:

81
79
55
12
76
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

Output I want:

81 79 55 12 76

Here is a sample of what I am scraping:

</li>, <li><span class="bp3-tag p p-81">81</span> f1</span>
</li>, <li><span class="bp3-tag p p-79">79</span> f2</span>
</li>, <li><span class="bp3-tag p p-55">55</span> f3</span>
</li>, <li><span class="bp3-tag p p-12">12</span> f4</span>
</li>, <li><span class="bp3-tag p p-76">76</span> f5</span>
[<li><span class="tooltip multiline" data-tooltip="some text i don't care about.">

My code looks like this, where a_stats is the list of fields being searched (f1, f2, ...)

dws = soup.find_all('div', {'class': 'col-3'})
more_lis = [div.find_all('li') for div in dws]
lis = soup.find_all('li') + more_lis
for li in lis:
       for stats in a_stats:
           if stats in li.text:
                t = re.findall('\d+', li.text)
                ti = (" ".join(t))
                print(ti)

I'm very much a novice, and this feels like it should be easy but I just can't get there yet. Help appreciated. Many thanks in advance.

Upvotes: 0

Views: 122

Answers (2)

user2668284
user2668284

Reputation:

Here's an example based on reading the HTML from a file. The changes needed for your use-case should be obvious:-

from bs4 import BeautifulSoup

with open('/Users/andy/dummy.html') as html:
    vals = []
    soup = BeautifulSoup(html, 'html.parser')
    divs = soup.find_all('div', class_='col-3')
    for div in divs:
        for li in div.find_all('li'):
            vals.append(li.text+' ')
    print(''.join(vals))

Upvotes: 0

Debdut Goswami
Debdut Goswami

Reputation: 1379

Instead of print(t1) try print(t1, end=" ")

EDIT

dws = soup.find_all('div', {'class': 'col-3'})
more_lis = [div.find_all('li') for div in dws]
lis = soup.find_all('li') + more_lis
for li in lis:
       for stats in a_stats:
           try:
               if stats in li.text:
                   t = re.findall('\d+', li.text)
                   ti = (" ".join(t))
                   print(ti)
           except AttributeError:
               pass

Added try and except block to handle AttributeError


The end argument in print decides what should follow after the object is printed. By default it is \n so you get the new line. Change it to a while space like " " and that should be it.

Upvotes: 1

Related Questions