Reputation: 1
I am seeking for a way to parse a rss feed (XML) in PowerShell for specific string. The RSS (shortened) looks like:
<channel>
<title>title here</title>
<link>http://link.com</link>
<description>this is a description</description>
<language>en-us</language>
<item>
<title>title1</title>
<description>URL: url1.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA</description>
</item>
<item>
<title>title2</title>
<description>URL: url2.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA</description>
</item>
<item>
<title>title3</title>
<description>URL: url2.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA</description>
</item>
I am downloading the RSS and I am able to parse the for me interesting "description" field:
$rssFeed = [xml](New-Object System.Net.WebClient).DownloadString('http:/url2feed.com/rss/')
$rssFeed.rss.channel.item | Select-Object description -First 5
Output is:
URL: url1.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA URL: url2.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA URL: url3.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA
But I am just interested in the link after "URL:", e.g. url1.com/filename. So, may I ask for your assistance please to drop out beginning "URL:" and everything after first comma in "description" field? Furthermore I'd like to add "http://" before every URL.
Upvotes: 0
Views: 98
Reputation: 313
Try below line. which replace "URL: " as "URL: http://"
$rssFeed.rss.channel.item | Select-Object @{Name = "title"; Expression = {$_.description -replace "URL: ","URL: http://"}} -First 5
Upvotes: 0
Reputation: 674
This case is relatively simple but I'll post a solution here also as an idea for more complicated cases.
Let's assume you want to work with one of your lines.
$line="URL: url3.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA"
This lines has multiple delimiters with spaces attached. But because it is relatively well structured you can easily extract the information you want without regular expression by breaking it up in segments per delimiter.
For example this returns the url value
$url=(($line -split ", ")[0] -split ": ")[1]
If there is a case where the whitespaces are not canonical then you could push the responsibility out of the delimiter and into a trim function. Like this
$url=(($line -split ",")[0].Trim() -split ":")[1].Trim()
In either case the $url
will be
url3.com/filenamehere
and you can use it as you please e.g.
$url="http://$url"
Upvotes: 1