Reputation: 1071
I want to parse the table in this url and export it as a csv:
http://www.bde.es/webbde/es/estadis/fi/ifs_es.html
if i do this:
sauce = urlopen(url_bank).read()
soup = bs.BeautifulSoup(sauce, 'html.parser')
and then this:
resto = soup.find_all('td')
lista_text = []
for elements in resto:
lista_text = lista_text + [elements.string]
I get all the elements well parsed except the last column 'Códigos Isin' and this is because there is a break on html code '. I do not know what to do with, i have tried this part but still does not work:
lista_text = lista_text + [str(elements.string).replace('<br/>','')]
After that I take the list to a np.array an then to a dataframe to export it as .csv. That part is already done, I only have to fix that issue.
Thanks in advance!
Upvotes: 3
Views: 3677
Reputation: 473873
It's just that you need to be careful about what .string
does - if there are multiple children elements, it would return None
- as in the case with <br>
:
If a tag contains more than one thing, then it’s not clear what
.string
should refer to, so.string
is defined to beNone
Use .get_text()
instead:
for elements in resto:
lista_text = lista_text + [elements.get_text(strip=True)]
Upvotes: 4