Parse XML and remove some not needed strings and write to txt

I am seeking for a way to parse a rss feed (XML) in PowerShell for specific string. The RSS (shortened) looks like:

<channel>
<title>title here</title>
<link>http://link.com</link>
<description>this is a description</description>
<language>en-us</language>
<item>
<title>title1</title>
<description>URL: url1.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA</description>
</item>
<item>
<title>title2</title>
<description>URL: url2.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA</description>
</item>
<item>
<title>title3</title>
<description>URL: url2.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA</description>
</item>

I am downloading the RSS and I am able to parse the for me interesting "description" field:

$rssFeed = [xml](New-Object System.Net.WebClient).DownloadString('http:/url2feed.com/rss/')
$rssFeed.rss.channel.item | Select-Object description -First 5

Output is:

URL: url1.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA
URL: url2.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA
URL: url3.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA

But I am just interested in the link after "URL:", e.g. url1.com/filename. So, may I ask for your assistance please to drop out beginning "URL:" and everything after first comma in "description" field? Furthermore I'd like to add "http://" before every URL.

Upvotes: 0

Views: 98

Answers (2)

Neechalkaran
Neechalkaran

Reputation: 313

Try below line. which replace "URL: " as "URL: http://"

$rssFeed.rss.channel.item | Select-Object @{Name = "title"; Expression = {$_.description -replace "URL: ","URL: http://"}} -First 5

Upvotes: 0

Alex Sarafian
Alex Sarafian

Reputation: 674

This case is relatively simple but I'll post a solution here also as an idea for more complicated cases.

Let's assume you want to work with one of your lines.

$line="URL: url3.com/filenamehere, IP Address: 123.123.123.123.123.123, Country: AA"

This lines has multiple delimiters with spaces attached. But because it is relatively well structured you can easily extract the information you want without regular expression by breaking it up in segments per delimiter.

For example this returns the url value

$url=(($line -split ", ")[0] -split ": ")[1]

If there is a case where the whitespaces are not canonical then you could push the responsibility out of the delimiter and into a trim function. Like this

$url=(($line -split ",")[0].Trim() -split ":")[1].Trim()

In either case the $url will be

url3.com/filenamehere

and you can use it as you please e.g.

$url="http://$url"

Upvotes: 1

Related Questions