Ray En
Ray En

Reputation: 83

extract data from html code

I want to extract data into div tags using BeautifulSoup :

<div class="post contentTemplate" itemprop="text">Data to extract<div class="clear"></div></div>

Upvotes: 1

Views: 59

Answers (2)

Chiheb Nexus
Chiheb Nexus

Reputation: 9257

You can try something like this:

from bs4 import BeautifulSoup as bs

data = '<div class="post contentTemplate" itemprop="text">Data to extract<div class="clear"></div></div>'
soup = bs(data)
m = soup.findAll("div", {"class": "post contentTemplate"})
for k in m:
    print(k.get_text())

Output:

Data to extract

Upvotes: 1

odradek
odradek

Reputation: 1001

you can use the get_text() method. this will extract all text from every div that find_all() finds in the source code.

data = [e.get_text() for e in html.find_all('div')]

when run it returns:

[u'Data to extract', u'']

if you don't want the empty values just filter them out.

data = [e.get_text() for e in html.find_all('div') if e.get_text()]

Upvotes: 0

Related Questions