Reputation: 19
I am scraping a site for some stats and getting the results as expected, but I can't get the final list output into a string. Searched and tried everything I can find... strip()
, append()
, replace('\n')
, replace('\n\t\r')
, and a few dozen other things. And, I get an output error at the end as there are some additional items in list I don't want.
Output I get:
81
79
55
12
76
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Output I want:
81 79 55 12 76
Here is a sample of what I am scraping:
</li>, <li><span class="bp3-tag p p-81">81</span> f1</span>
</li>, <li><span class="bp3-tag p p-79">79</span> f2</span>
</li>, <li><span class="bp3-tag p p-55">55</span> f3</span>
</li>, <li><span class="bp3-tag p p-12">12</span> f4</span>
</li>, <li><span class="bp3-tag p p-76">76</span> f5</span>
[<li><span class="tooltip multiline" data-tooltip="some text i don't care about.">
My code looks like this, where a_stats is the list of fields being searched (f1, f2, ...)
dws = soup.find_all('div', {'class': 'col-3'})
more_lis = [div.find_all('li') for div in dws]
lis = soup.find_all('li') + more_lis
for li in lis:
for stats in a_stats:
if stats in li.text:
t = re.findall('\d+', li.text)
ti = (" ".join(t))
print(ti)
I'm very much a novice, and this feels like it should be easy but I just can't get there yet. Help appreciated. Many thanks in advance.
Upvotes: 0
Views: 122
Reputation:
Here's an example based on reading the HTML from a file. The changes needed for your use-case should be obvious:-
from bs4 import BeautifulSoup
with open('/Users/andy/dummy.html') as html:
vals = []
soup = BeautifulSoup(html, 'html.parser')
divs = soup.find_all('div', class_='col-3')
for div in divs:
for li in div.find_all('li'):
vals.append(li.text+' ')
print(''.join(vals))
Upvotes: 0
Reputation: 1379
Instead of print(t1)
try print(t1, end=" ")
EDIT
dws = soup.find_all('div', {'class': 'col-3'})
more_lis = [div.find_all('li') for div in dws]
lis = soup.find_all('li') + more_lis
for li in lis:
for stats in a_stats:
try:
if stats in li.text:
t = re.findall('\d+', li.text)
ti = (" ".join(t))
print(ti)
except AttributeError:
pass
Added try and except block to handle AttributeError
The end argument in print decides what should follow after the object is printed. By default it is \n
so you get the new line. Change it to a while space like " "
and that should be it.
Upvotes: 1