Reading CDATA from XML file with BeautifulSoup

Question

I have tweets saved in an XML file as:


  142389495503925248
  ccifuentes
  
  2011-12-02T00:47:55
  es
  
   NONEAGREEMENT
  
  
   otros

To parse these, I created a BeautifulSoup instance via

soup = BeautifulSoup(xml, "lxml")

where xml is the raw XML file. To access a single tweet I did this:

tweets = soup.find_all('tweet')
for tw in tweets:
    print(tw)
    break

This results in


142389495503925248
ccifuentes

2011-12-02T00:47:55
es

NONEAGREEMENT


otros

Note that the CDATA part was omitted when I printed the first tweet. It is important for me to get it, how can I do this?

宏杰李 · Accepted Answer

soup = bs4.BeautifulSoup(xml, 'xml')

change the parser to xml

out:

Salgo de #VeoTV , que día más largoooooo...

OR html.parser:

soup = bs4.BeautifulSoup(xml, 'html.parser')

out:

Answers (1)