Reputation: 5206
I have an XML file I'm trying to parse with XML::LibXML in Perl. I have:
use XML::LibXML;
my $parser = XML::LibXML->new();
print "READING: $this_path/feed_france.xml \n";
my $dom = $parser->parse_file("$this_path/feed_france.xml");
# Find all "item" elements inside the "channel" element
my @items = $dom->findnodes('/rss/channel/item');
# Loop through each item and extract the title, company, and pubDate
foreach my $item (@items) {
my $title = $item->findvalue('title');
my $company = $item->findvalue('company');
my $pubDate = $item->findvalue('pubDate');
my $summary = $item->findvalue('description');
my $description = $item->findvalue('content:encoded'); # full
my $company = $item->findvalue('company');
my $guid = $item->findvalue('guid');
my $link = $item->findvalue('link');
}
And a trimmed down version of the XML in question:
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<channel>
<title>The White House</title>
<atom:link href="https://www.whitehouse.gov/feed/" rel="self" type="application/rss+xml" />
<link>https://www.whitehouse.gov/</link>
<description></description>
<lastBuildDate>Mon, 27 Mar 2023 06:51:42 +0000</lastBuildDate>
<language>en-US</language>
<sy:updatePeriod>
hourly </sy:updatePeriod>
<sy:updateFrequency>
1 </sy:updateFrequency>
<image>
<url>https://www.whitehouse.gov/wp-content/uploads/2021/01/cropped-cropped-wh_favicon.png?w=32</url>
<title>The White House</title>
<link>https://www.whitehouse.gov/</link>
<width>32</width>
<height>32</height>
</image>
<item>
<title>FACT SHEET: Extreme MAGA Congressional Republicans Propose Handouts to Rich and Tax Hikes for Working Families</title>
<link>https://www.whitehouse.gov/briefing-room/statements-releases/2023/03/27/fact-sheet-extreme-maga-congressional-republicans-propose-handouts-to-rich-and-tax-hikes-for-working-families/</link>
<dc:creator><![CDATA[The White House]]></dc:creator>
<pubDate>Mon, 27 Mar 2023 10:00:00 +0000</pubDate>
<category><![CDATA[Statements and Releases]]></category>
<guid isPermaLink="false">https://www.whitehouse.gov/?p=72930</guid>
<description><![CDATA[<p>President Biden Asks the Super-Wealthy to Pay Their Fair Share and Cuts Taxes for Hardworking Families The President’s economic vision is to invest in America and grow the economy from the bottom up and middle out, not the top down. As part of his plan to cut costs for Americans and give families more breathing…</p>
<p>The post <a rel="nofollow" href="https://www.whitehouse.gov/briefing-room/statements-releases/2023/03/27/fact-sheet-extreme-maga-congressional-republicans-propose-handouts-to-rich-and-tax-hikes-for-working-families/">FACT SHEET: Extreme MAGA Congressional Republicans Propose Handouts to Rich and Tax Hikes for Working<span class="dewidow"> </span>Families</a> appeared first on <a rel="nofollow" href="https://www.whitehouse.gov">The White House</a>.</p>
]]></description>
<content:encoded><![CDATA[
<p class="has-text-align-center"><em>President Biden Asks the Super-Wealthy to Pay Their Fair Share and Cuts Taxes for Hardworking Families</em></p>
<li>Self-employed people and small business owners who don’t get health insurance through their jobs. In 2021, self-employed people and small business owners accounted for <a href="https://aspe.hhs.gov/sites/default/files/documents/36e5e989516728adcc63e398b3e3d23d/aspe-marketplace-coverage-economic-benefits.pdf">25 percent</a> of working-age people with ACA marketplace coverage.<br> </li>
<li><strong><u>Working families and middle-class retirees</u>. </strong>Some Congressional Republicans <a href="https://twitter.com/RepBuddyCarter/status/1632076524839895041">continue to push</a> a national retail sales tax <a href="https://www.congress.gov/bill/118th-congress/house-bill/25">bill</a> that would repeal most existing taxes and impose a new 30% sales tax on American families. That legislation would increase the debt by <a href="https://www.brookings.edu/2023/03/01/proposed-fairtax-rate-would-add-trillions-to-deficits-over-10-years/">trillions of dollars</a> and deliver massive tax cuts to the well-off — while <a href="https://www.americanprogress.org/article/the-fair-tax-act-would-radically-restructure-the-nations-tax-system-in-favor-of-the-wealthy/">increasing taxes</a> by $7,000 for a retired couple with $60,000 in Social Security income and by $6,000 for a single mom making $38,000 a year.</li>
</ul>
<p class="has-text-align-center">###</p>
<p>The post <a rel="nofollow" href="https://www.whitehouse.gov/briefing-room/statements-releases/2023/03/27/fact-sheet-extreme-maga-congressional-republicans-propose-handouts-to-rich-and-tax-hikes-for-working-families/">FACT SHEET: Extreme MAGA Congressional Republicans Propose Handouts to Rich and Tax Hikes for Working<span class="dewidow"> </span>Families</a> appeared first on <a rel="nofollow" href="https://www.whitehouse.gov">The White House</a>.</p>
]]></content:encoded>
</item>
</channel>
</rss>
I can't seem to access the content:encoded bit via:
$item->findvalue('content:encoded')
Am I doing it wrong? I've tried looking on the manpage but can't see anything- I just assumed its the same as accessing any other tag, but maybe not? I'm sure I'm just being dumb and missing something glaringly obvious!
Upvotes: 1
Views: 29