Reputation: 1336
I'm trying to parse out the data from an RSS feed using powershell.
How do I get the contents of, title, guid, and content:encoded field?
For some reason, my code below just returns "...".
Any help greatly appreciated!
[xml]$hsg = Invoke-WebRequest http://technet.microsoft.com/en-us/security/rss/comprehensive
#$hsg.rss.channel.item | select title #this prints the list of blog posts
$ContentNamespace = New-Object Xml.XmlNamespaceManager $hsg.NameTable
$ContentNamespace.AddNamespace("content", "http://purl.org/rss/1.0/modules/content/")
#$hsg.rss.channel.item #this prints the list of posts
$hsg.rss.channel.item.selectSingleNode("content:encoded", $ContentNamespace)
The data looks like:
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rssdatehelper="urn:rssdatehelper" version="2.0">
<channel>
<title>Microsoft Security Content: Comprehensive Edition</title>
<link>http://technet.microsoft.com/security/bulletin</link>
<dc:date>Wed, 15 May 2013 08:00:00 GMT</dc:date>
<generator>umbraco</generator>
<description>Microsoft Security Content: Comprehensive Edition</description>
<language>en-US</language>
<item>
<title>
MS13-045 - Important : Vulnerability in Windows Essentials Could Allow Information Disclosure (2813707) - Version: 1.1
</title>
<link>
http://technet.microsoft.com/en-us/security/bulletin/ms13-045
</link>
<dc:date>2013-05-15T07:00:00.0000000Z</dc:date>
<guid>
http://technet.microsoft.com/en-us/security/bulletin/ms13-045
</guid>
<content:encoded>
<![CDATA[
Severity Rating: Important<br />
Revision Note: V1.1 (May 15, 2013): Corrected link to the download location in the Detection and Deployment Tools and Guidance section. This is an informational change only.<br />
Summary: This security update resolves a privately reported vulnerability in Windows Writer. The vulnerability could allow information disclosure if a user opens Writer using a specially crafted URL. An attacker who successfully exploited the vulnerability could override Windows Writer proxy settings and overwrite files accessible to the user on the target system. In a web-based attack scenario, a website could contain a specially crafted link that is used to exploit this vulnerability. An attacker would have to convince users to visit the website and open the specially crafted link.
]]>
</content:encoded>
</item>
Thanks!
Upvotes: 0
Views: 2396
Reputation: 396
You could bypass writing output to an intermediate file, skipping get-content. Frode's packages it up nicely into a psobject, who provided the solution.
cls
$x=[xml](iwr 'https://technet.microsoft.com/en-us/security/rss/comprehensive').content
foreach ($y in $x.rss.channel.selectnodes('//item')) {
"`r`n`t$($y.title)"
$y.pubdate
$y.link
$y.encoded.'#cdata-section'
}
You may find your rss/atom returns a slightly different structure, and I found this necessary for a different feed:
foreach ($y in $x.feed.entry)
The intellisense in the IDE helped me in navigation.
Upvotes: 0
Reputation: 54941
Try this:
$rss = [xml](Get-Content .\test.rss)
$rss.SelectNodes('//item') | % {
$posts += New-Object psobject -Property @{
Title = $_.Title.Trim()
Guid = $_.Guid.Trim()
Content = $_.Encoded."#cdata-section".Trim()
}
}
Sample of parsed data (the array only contains one item since there only was one in your sample):
$posts
Title Guid Content
----- ---- -------
MS13-045 - Important : Vulnera... http://technet.microsoft.com/... Severity Rating: Important<br...
btw, your sample lacked the following in the end:
</channel>
</rss>
Upvotes: 3