Peter Pavelka
Peter Pavelka

Reputation: 511

Use list of files saved in text file to search for string in powershell

I have 2 sets of operations, in the 1st one I look for files that contain a string, then in second one I use that list to extract lines that contains another string and then edit them.

$List_Of_Files = Get-ChildItem "$outputfolder*.html" -recurse | 
  Select-String -pattern "https://www.youtube.com" | group path | 
    select name -ExpandProperty Name

$List_Of_Titles = @(Get-Content $List_Of_Files | Where-Object { $_.Contains("<title>") }) | 
  Foreach-Object {
    $content = $_ -replace "    <title>", "  <video:title>";
    $content -replace "</title>", "</video:title>"
  }

Code works as expected, but the problem is that I need the 1st set of operations to output results into a text file and then use that file in second set which should also output results into another text file.

I have tried the following, but second set doesn't create the file, but doesn't give me any error either.

Get-ChildItem "$outputfolder*.html" -recurse | 
  Select-String -pattern "https://www.youtube.com" | group path | 
    select name -ExpandProperty Name | Set-Content "c:\List_Of_Files.txt"

@(Get-Content "c:\List_Of_Files.txt" | Where-Object { $_.Contains("<title>") }) |
 Foreach-Object {
    $content = $_ -replace "    <title>", "  <video:title>";
    $content -replace "</title>", "</video:title>"
 } | Set-Content "c:\list_of_titles.txt"

I have tried to modify it in different ways, but can't figure out how to make it work.

Upvotes: 1

Views: 2443

Answers (1)

mklement0
mklement0

Reputation: 440162

c:\List_Of_Files.txt contains a list of file paths and you're trying to filter that list by whether the path contains "<title>", which results in no matches.
(I have no explanation for why your 1st snippet worked.)

Your problem stems from confusion over what objects are being passed through the pipeline: you start with file paths (strings), then threat them as if they were the files' content.

Instead, I assume you meant to test the contents of each file identified by its path.

A quick fix would be:

Get-Content "c:\List_Of_Files.txt" | Where-Object { Select-String -Quiet '<title>' $_ }

Note, however, that you must also adapt the ForEach-Object command accordingly:

Foreach-Object {
    # Read the content of the file whose path was given in $_,
    # and modify it.
    # (If you don't want to save the modifications, omit the `Set-Content` call.)
    $content = ((Get-Content $_) -replace "    <title>", "  <video:title>");       
    $content = $content -replace "</title>", "</video:title>";
    # Save modifications back to the input file (if desired).
    Set-Content -Value $content -Path $_;
    # $content is the entire document, so to output only the title line(s) 
    # we need to match again:
    $content -match '<video:title>'
    # Note: This relies on the title HTML element to be on a *single* line
    #       *of its own*, which may not be the case; 
    #       if it isn't, you must use proper HTML parsing to extract it.
 }

To put it all together:

Get-Content "c:\List_Of_Files.txt" | Where-Object { Select-String -Quiet '<title>' $_ } | 
    Foreach-Object {
        $content = ((Get-Content $_) -replace "    <title>", "  <video:title>");
        $content = $content -replace "</title>", "</video:title>";
        Set-Content -Value $content -Path $_;
        $content -match '<video:title>'
     } | Set-Content "c:\list_of_titles.txt"

Note that you can make the whole command more efficient by removing the filtering step that uses Select-String and performing the filtering inside the ForEach-Object block.

Also, the string replacement could be optimized or, preferably, handled with true HTML parsing.

Upvotes: 1

Related Questions