Advan721
Advan721

Reputation: 55

"TypeError: string indices must be integers" error when I try to parse HTML page with BeautifulSoup

I need to parse price list which is in "span" tags with "class". Numbers has a specific setting (2 400 p) so I also need to remove spaces and "p" letter. This is my code below:

from bs4 import BeautifulSoup
soup = BeautifulSoup(open("1.html"))
for link in soup.findAll("span", { "class" : "b-sbutton mod_price skin_product size_normal scheme_available" }):
links = link.get_text()
print(links)
links_len = len(links)
int(links_len)
for links_len in links:
    a = links[links_len]
    a.replace(' ', '')
    a.replace('р', '')
print(links)

But when I try to run the script there is an error

Traceback (most recent call last):
  File "get_data.py", line 9, in <module>
    a = links[links_len]
TypeError: string indices must be integers

What am I missing here?

Upvotes: 2

Views: 1737

Answers (1)

alecxe
alecxe

Reputation: 473863

You've mixed up lists, strings, indexes. You can make it using a list comprehension:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("1.html"))
links = [link.text.replace(' ', '').replace('p', '') 
         for link in soup.find_all("span", 
                                   {"class": "b-sbutton mod_price skin_product size_normal scheme_available"})]

Upvotes: 2

Related Questions