zBernie
zBernie

Reputation: 97

How to split a string into substrings?

I have a file named collection.xml which contains a list of movies in my Easter movie collection shown below. Once I get these I'm trying to use Split to split the files on the strings "Movies/" and . This would result in movie names like:

The Easter Bunny is Coming to Town (2006).mp4

I've been trying various permutations of Split() and the -split modifier. How can I split the output below to get just the movie names as shown above?

Get-Content .\collection.xml | Select-String Path

      <Path>/volume1/Media Library/Movies/Here Comes Peter Cottontail (1971).mp4</Path>
      <Path>/volume1/Media Library/Movies/Here Comes Peter Cottontail - The Movie (2005).mp4</Path>
      <Path>/volume1/Media Library/Movies/The Easter Bunny is Coming to Town (2006).mp4</Path>
      <Path>/volume1/Media Library/Movies/Its The Easter Beagle Charlie Brown (2008).mp4</Path>
      <Path>/volume1/Media Library/Movies/Hop (2011).mp4</Path>
      <Path>/volume1/Media Library/Movies/Peter Rabbit (2018).mp4</Path>

Upvotes: 1

Views: 105

Answers (2)

Callie J
Callie J

Reputation: 31296

It's straightforward if you treat this as an XML file full of filenames as you can do this in a single line; I've broken into a multiple for ease of reading:

Option 1:

([xml](get-content temp.txt)).SelectNodes("//Path") | foreach-object {
    [io.path]::GetFileNameWithoutExtension($_.'#text') 
}

This effectively:

  1. Reads the file in as XML
  2. Selects all the "Path" nodes in the file - you may need to adjust this to better match your actual XML file. This is a straightforward XPath.
  3. For each node found, call the .NET native method over the text part of the node to extract the filename

Option 2:

Pretty much the same, but using more native XML cmdlets, which may make easier reading:

(select-xml -xpath '//Path' -path .\temp.txt).Node | foreach-object { 
    [io.path]::GetFilenameWithoutExtension($_.'#text') 
}

Again, tune the XPath to suit your XML file.

There's various ways to structure both of these for your taste (and exact XML format) by moving the ".Node" and ".'#text'" selectors inside (or outside) the foreach; for example, we can remove the brackets around select-xml in the line above by shifting Node within the foreach:

select-xml -xpath '//Path' -path .\temp.txt | foreach-object { 
    [io.path]::GetFilenameWithoutExtension($_.Node.'#text') 
}

...and variations on a theme. Your XML file structure can have a bearing on this; anything else is personal preference and readability.

Upvotes: 1

PowerShellGuy
PowerShellGuy

Reputation: 801

Since I don't know what your full .xml file looks like, I made this with just the info you gave, and some simple regex.

I'm making a couple assumptions here

  1. Your file path always ends in "Movies"
  2. You don't want the file type at the end of the string
$banana = Get-Content C:\Temp\collection.xml | Select-String Path

foreach($line in $banana)
{
    #load the line as an xml object, expand the path property, and replace the characters we don't want.
    ([xml]$line).Path -replace "^\/.+Movies\/|\..+$"

}

Those hieroglyphics after the -replace mean this

^ : Start of the line

\/ : Literal / character

. : Any character (except line terminators)

+ : At least one, but up to infinite

Movies : The literal string "Movies"

\/ : Literal / character

| : Or

\. : Literal . (period) character

.+ : . and + combined, meaning any character least once, but up to infinite

$ : End of the line

Upvotes: 0

Related Questions