sy1vi3
sy1vi3

Reputation: 181

Looking for the contents of a tag in BeautifulSoup, but it returns blank

The xml I'm trying to parse looks like this:

<item>
<title>Port on brain, some functions not working</title>
<dc:creator>
<![CDATA[ @nathankmiles Nathan ]]>
</dc:creator>
<description>
<![CDATA[ <p>Sorry, I thought we had already included the code that would be needed. Here is what we have been using for testing. In the code below, the problem is on Port 1 (LeftFrontDriveMotor).</p> <p>Here is the code from main.cpp.</p> <p><span class="hashtag">#include</span> “vex.h”<br> <span class="hashtag">#include</span> “robot-config.h”</p> <p>using namespace vex;</p> <p>competition Competiton;</p> <p>void leftDrive() {<br> LeftFrontDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> LeftBackDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> }</p> <p>void pre_auton( void ) {<br> // Initializing Robot Configuration. DO NOT REMOVE!<br> vexcodeInit();<br> }</p> <p>void autonomous( void ) {</p> <p>}</p> <p>void usercontrol( void ) {<br> while(true) {<br> Controller1.Axis3.changed(leftDrive);<br> }<br> }</p> <p>int main() {<br> pre_auton();</p> <p>Competiton.autonomous( autonomous );<br> Competiton.drivercontrol( usercontrol );</p> <p>while(true) {<br> vex::task::sleep(100);<br> }<br> }</p> <p>And, here is the code from robot-config.cpp</p> <p><span class="hashtag">#include</span> “vex.h”<br> using namespace vex;</p> <p>// A global instance of brain used for printing to the V5 brain screen<br> brain Brain;</p> <p>//VEXcode Devices<br> controller Controller1 = controller(primary);<br> motor LeftFrontDriveMotor (PORT1, ratio18_1,false);<br> motor LeftBackDriveMotor (PORT11, ratio18_1,false);<br> motor RightFrontDriveMotor (PORT10, ratio18_1,true);<br> motor RightBackDriveMotor (PORT20, ratio18_1,true);</p> <p>/**</p> <ul> <li>Used to initialize code/tasks/devices added using tools in VEXcode Text.</li> <li> </li><li>This should be called at the start of your int main function.<br> */</li> </ul> <p>void vexcodeInit(void) {<br> // Nothing to initialize<br> }</p> ]]>
</description>
<link>https://www.vexforum.com/t/port-on-brain-some-functions-not-working/83135/8</link>
<pubDate>Sun, 19 Jul 2020 16:38:10 +0000</pubDate>
<guid isPermaLink="false">www.vexforum.com-post-655101</guid>
</item>

I need the text in between the two <dc:creator> tags, but when I search for

soup.find('dc:creator')

It simply returns with

<dc:creator></dc:creator>

I think it might have something to do with the <>'s around the text, but I'm not sure.

How do I find the contents of the <dc:creator> tag with BeautifulSoup?

Upvotes: 4

Views: 102

Answers (2)

bigbounty
bigbounty

Reputation: 17408

I used html.parser and got the result. For some reason it's not working for lxml parser

In [1]: a = """<item>
   ...: <title>Port on brain, some functions not working</title>
   ...: <dc:creator>
   ...: <![CDATA[ @nathankmiles Nathan ]]>
   ...: </dc:creator>
   ...: <description>
   ...: <![CDATA[ <p>Sorry, I thought we had already included the code that would be needed. Here is what we have been
   ...:  using for testing. In the code below, the problem is on Port 1 (LeftFrontDriveMotor).</p> <p>Here is the code
   ...:  from main.cpp.</p> <p><span class="hashtag">#include</span> “vex.h”<br> <span class="hashtag">#include</span>
   ...:  “robot-config.h”</p> <p>using namespace vex;</p> <p>competition Competiton;</p> <p>void leftDrive() {<br> Lef
   ...: tFrontDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> LeftBackDriveMoto
   ...: r.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> }</p> <p>void pre_auton( void )
   ...: {<br> // Initializing Robot Configuration. DO NOT REMOVE!<br> vexcodeInit();<br> }</p> <p>void autonomous( voi
   ...: d ) {</p> <p>}</p> <p>void usercontrol( void ) {<br> while(true) {<br> Controller1.Axis3.changed(leftDrive);<b
   ...: r> }<br> }</p> <p>int main() {<br> pre_auton();</p> <p>Competiton.autonomous( autonomous );<br> Competiton.dri
   ...: vercontrol( usercontrol );</p> <p>while(true) {<br> vex::task::sleep(100);<br> }<br> }</p> <p>And, here is the
   ...:  code from robot-config.cpp</p> <p><span class="hashtag">#include</span> “vex.h”<br> using namespace vex;</p>
   ...: <p>// A global instance of brain used for printing to the V5 brain screen<br> brain Brain;</p> <p>//VEXcode De
   ...: vices<br> controller Controller1 = controller(primary);<br> motor LeftFrontDriveMotor (PORT1, ratio18_1,false)
   ...: ;<br> motor LeftBackDriveMotor (PORT11, ratio18_1,false);<br> motor RightFrontDriveMotor (PORT10, ratio18_1,tr
   ...: ue);<br> motor RightBackDriveMotor (PORT20, ratio18_1,true);</p> <p>/**</p> <ul> <li>Used to initialize code/t
   ...: asks/devices added using tools in VEXcode Text.</li> <li> </li><li>This should be called at the start of your
   ...: int main function.<br> */</li> </ul> <p>void vexcodeInit(void) {<br> // Nothing to initialize<br> }</p> ]]>
   ...: </description>
   ...: <link>https://www.vexforum.com/t/port-on-brain-some-functions-not-working/83135/8</link>
   ...: <pubDate>Sun, 19 Jul 2020 16:38:10 +0000</pubDate>
   ...: <guid isPermaLink="false">www.vexforum.com-post-655101</guid>
   ...: </item>"""

In [2]: from bs4 import BeautifulSoup

In [3]: soup = BeautifulSoup(a, "lxml")

In [4]: soup.find('dc:creator')
Out[4]:
<dc:creator>
</dc:creator>

In [5]: soup = BeautifulSoup(a, "html.parser")

In [6]: soup.find('dc:creator')
Out[6]:
<dc:creator>
<![CDATA[ @nathankmiles Nathan ]]>
</dc:creator>

In [7]: list(soup.find('dc:creator').children)
Out[7]: ['\n', ' @nathankmiles Nathan ', '\n']

In [8]: soup.find('dc:creator').text.strip()
Out[8]: '@nathankmiles Nathan'

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195573

If you don't have defined XML namespaces, the xml parser will strip them. So you can search by <creator> tag:

from bs4 import BeautifulSoup

txt = '''<item>
<title>Port on brain, some functions not working</title>
<dc:creator>
<![CDATA[ @nathankmiles Nathan ]]>
</dc:creator>
<description>
<![CDATA[ <p>Sorry, I thought we had already included the code that would be needed. Here is what we have been using for testing. In the code below, the problem is on Port 1 (LeftFrontDriveMotor).</p> <p>Here is the code from main.cpp.</p> <p><span class="hashtag">#include</span> “vex.h”<br> <span class="hashtag">#include</span> “robot-config.h”</p> <p>using namespace vex;</p> <p>competition Competiton;</p> <p>void leftDrive() {<br> LeftFrontDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> LeftBackDriveMotor.spin(directionType::fwd, Controller1.Axis3.value(),velocityUnits::pct);<br> }</p> <p>void pre_auton( void ) {<br> // Initializing Robot Configuration. DO NOT REMOVE!<br> vexcodeInit();<br> }</p> <p>void autonomous( void ) {</p> <p>}</p> <p>void usercontrol( void ) {<br> while(true) {<br> Controller1.Axis3.changed(leftDrive);<br> }<br> }</p> <p>int main() {<br> pre_auton();</p> <p>Competiton.autonomous( autonomous );<br> Competiton.drivercontrol( usercontrol );</p> <p>while(true) {<br> vex::task::sleep(100);<br> }<br> }</p> <p>And, here is the code from robot-config.cpp</p> <p><span class="hashtag">#include</span> “vex.h”<br> using namespace vex;</p> <p>// A global instance of brain used for printing to the V5 brain screen<br> brain Brain;</p> <p>//VEXcode Devices<br> controller Controller1 = controller(primary);<br> motor LeftFrontDriveMotor (PORT1, ratio18_1,false);<br> motor LeftBackDriveMotor (PORT11, ratio18_1,false);<br> motor RightFrontDriveMotor (PORT10, ratio18_1,true);<br> motor RightBackDriveMotor (PORT20, ratio18_1,true);</p> <p>/**</p> <ul> <li>Used to initialize code/tasks/devices added using tools in VEXcode Text.</li> <li> </li><li>This should be called at the start of your int main function.<br> */</li> </ul> <p>void vexcodeInit(void) {<br> // Nothing to initialize<br> }</p> ]]>
</description>
<link>https://www.vexforum.com/t/port-on-brain-some-functions-not-working/83135/8</link>
<pubDate>Sun, 19 Jul 2020 16:38:10 +0000</pubDate>
<guid isPermaLink="false">www.vexforum.com-post-655101</guid>
</item>'''

soup = BeautifulSoup(txt, 'xml')
print(soup.find('creator').get_text(strip=True))

Prints:

@nathankmiles Nathan

Or: You can use html.parser and bs4.CData (txt is your HTML snippet from the question):

from bs4 import BeautifulSoup, CData

soup = BeautifulSoup(txt, 'html.parser')
print(soup.find('dc:creator').find_next(text=lambda x: isinstance(x, CData)).strip())

Prints:

@nathankmiles Nathan

Upvotes: 1

Related Questions