user1911509
user1911509

Reputation: 143

Parsing file in Powershell

I have the following raw content in a file. I am trying just print the list of all urls. I have kind of wrote some script. Getting content (reading) from the file and using ForEach line in lines - but do not know how to filter just the Url from the content. Any thoughts ?

Line 18942:         "url": "http://harvardpolitics.com/tag/brussels/",
Line 18994:         "url": "http://203.36.101.164/4f64555b4217b47b7c64b3fec19e389b/1502455203/Telstra/Foxtel-Vod/fxmultismvod5256/store2/ON307529/ON307529_hss.ism/QualityLevels(791000)/Fragments(video=9900000000)"
Line 19044:         "url": "https://www.gucci.com/int/en/ca/women/handbags/womens-shoulder-bags-c-women-handbags-shoulder-bags?filter=%3ANewest%3Acolors%3AGold%7Ccb9822",
Line 19096:         "url": "https://bagalio.cz/batohy-10l?cat=3p%3D1urceni%3D2582p%3D1kapsa_ntb_velikost%3D2179p%3D1manufacturer%3D1302p%3D1color%3D84p=1kapsa_ntb_velikost=2192",
Line 19148:         "url": "http://www.csillagjovo.gportal.hu/gindex.php?pg=31670155",
Line 19200:         "url": "http://www.copiersupplystore.com/hp/color-laserjet-4700dn/j7934a-j7934ar",

Upvotes: 0

Views: 165

Answers (3)

TessellatingHeckler
TessellatingHeckler

Reputation: 28963

$Urls = Get-Content file.txt | ForEach-Object { $_.Split('"')[3] }

Upvotes: 2

KarlGdawg
KarlGdawg

Reputation: 351

Try this out to just get the urls:

$content = Get-Content <file-with-output> # or other way of getting the data

$urls = $content | ForEach-Object { ($_ -replace ".+?(?=http.+)","").Trim('",')}

Edit: Added $urls to catch result.

Upvotes: 2

Olaf Reitz
Olaf Reitz

Reputation: 694

One way could be the substring method another version could be some regex.

$Text = Get-Content D:\Test\test.txt
foreach ($Line in $Text) {
    # SubString Version
    $FirstIndex = $Line.IndexOf('http')
    $URLLength = ($Line.LastIndexOf('"') - $FirstIndex)
    $Line.Substring($FirstIndex, $URLLength)

    # Regex Version 
    $Regex = '(http[s]?|[s]?ftp[s]?)(:\/\/)([^\s,]+)'
    ([regex]::Matches($Line,$Regex)).Value.TrimEnd('"')([^\s,]+)')).Value.TrimEnd('"')
}

Upvotes: 2

Related Questions