Reputation: 511
I have 2 sets of operations, in the 1st one I look for files that contain a string, then in second one I use that list to extract lines that contains another string and then edit them.
$List_Of_Files = Get-ChildItem "$outputfolder*.html" -recurse |
Select-String -pattern "https://www.youtube.com" | group path |
select name -ExpandProperty Name
$List_Of_Titles = @(Get-Content $List_Of_Files | Where-Object { $_.Contains("<title>") }) |
Foreach-Object {
$content = $_ -replace " <title>", " <video:title>";
$content -replace "</title>", "</video:title>"
}
Code works as expected, but the problem is that I need the 1st set of operations to output results into a text file and then use that file in second set which should also output results into another text file.
I have tried the following, but second set doesn't create the file, but doesn't give me any error either.
Get-ChildItem "$outputfolder*.html" -recurse |
Select-String -pattern "https://www.youtube.com" | group path |
select name -ExpandProperty Name | Set-Content "c:\List_Of_Files.txt"
@(Get-Content "c:\List_Of_Files.txt" | Where-Object { $_.Contains("<title>") }) |
Foreach-Object {
$content = $_ -replace " <title>", " <video:title>";
$content -replace "</title>", "</video:title>"
} | Set-Content "c:\list_of_titles.txt"
I have tried to modify it in different ways, but can't figure out how to make it work.
Upvotes: 1
Views: 2443
Reputation: 440162
c:\List_Of_Files.txt
contains a list of file paths and you're trying to filter that list by whether the path contains "<title>"
, which results in no matches.
(I have no explanation for why your 1st snippet worked.)
Your problem stems from confusion over what objects are being passed through the pipeline: you start with file paths (strings), then threat them as if they were the files' content.
Instead, I assume you meant to test the contents of each file identified by its path.
A quick fix would be:
Get-Content "c:\List_Of_Files.txt" | Where-Object { Select-String -Quiet '<title>' $_ }
Note, however, that you must also adapt the ForEach-Object
command accordingly:
Foreach-Object {
# Read the content of the file whose path was given in $_,
# and modify it.
# (If you don't want to save the modifications, omit the `Set-Content` call.)
$content = ((Get-Content $_) -replace " <title>", " <video:title>");
$content = $content -replace "</title>", "</video:title>";
# Save modifications back to the input file (if desired).
Set-Content -Value $content -Path $_;
# $content is the entire document, so to output only the title line(s)
# we need to match again:
$content -match '<video:title>'
# Note: This relies on the title HTML element to be on a *single* line
# *of its own*, which may not be the case;
# if it isn't, you must use proper HTML parsing to extract it.
}
To put it all together:
Get-Content "c:\List_Of_Files.txt" | Where-Object { Select-String -Quiet '<title>' $_ } |
Foreach-Object {
$content = ((Get-Content $_) -replace " <title>", " <video:title>");
$content = $content -replace "</title>", "</video:title>";
Set-Content -Value $content -Path $_;
$content -match '<video:title>'
} | Set-Content "c:\list_of_titles.txt"
Note that you can make the whole command more efficient by removing the filtering step that uses Select-String
and performing the filtering inside the ForEach-Object
block.
Also, the string replacement could be optimized or, preferably, handled with true HTML parsing.
Upvotes: 1