user3421341
user3421341

Reputation: 111

Exclude HTML comment from text file

I have a config file from which I need to output some text and convert it to a CSV. I am stuck at the first step which is that this file has few HTML comments which are to be excluded and the remaining text is to be used for exporting to CSV purposes.

HTML comment looks like following:

<!--<add name=                                />
    <add name=                                />
    <add name=                                />-->

I have tried different regex's to solve this, but no luck. The closest I have got is to exclude the first and third line using the below regex, but that doesn't solve the issue as second line is still present:

Get-Content –Path C:\Pathtothefile -notmatch "^\s*(<!--)|>*(-->)$"

This regex will take out the line which starts with , but not the middle one which is part of the comment. I have multiple files with multiple comments.

Tried several different combos ("<!--[^>]*(-->)$"), no luck so far.

Upvotes: 2

Views: 512

Answers (2)

user6811411
user6811411

Reputation:

Not knowing the content of your config file and despite jscott's hint.

  • To have a RegEx match over several lines you have to get the raw content

Then you need to specify a regex option to match across line terminators i.e.reference

  • SingleLine mode (. matches any char including line feed), as well as
  • Multiline mode (^ and $ match embedded line terminators), e.g.
  • (?smi) - note the "i" is to ignore case
  • the ? to have an ungreedy match otherwise the start of one comment could match up the end of the last comment.

(Get-Content .\config.html -raw) -replace '(?smi)^\<!--.*?--\>?'

Checked this on Regex101

Upvotes: 1

Zoredache
Zoredache

Reputation: 39603

In the documents you need to process the <!-- always be at the start of the line and the --> at the end? If so then you probably need to get the content, and run it through a loop where you process your document line by line, toggling a state variable for content, or not.

$data=@"
<!--<add name=                                />
    <add name=                                />
    <add name=                                />-->
a,b,c,d
1,2,3,4
"@
$state='content'
$data  -split "`n" |
ForEach-Object {
  If ($_ -match '^<!--') {
    $state='comment'
    return $null  # because `continue` doesn't work in a foreach-object
  }
  If ($_ -match '-->$') {
    $state='content'
    return $null
  }
  If ($state -eq 'content') {
    $_
  }
}

Results

a,b,c,d
1,2,3,4

Upvotes: 2

Related Questions