gooly
gooly

Reputation: 1341

Why does removal of empty lines from multiline string in PowerShell fail using Replace function?

I am loading HTML emails and at first I remove the HTML tags, I replace each   by a space and I reduce the double spaces by a single space - that works.

But now I have a lot of empty lines which I cannot remove. I have seen the examples which remove empty lines while reading a file, but I don't have any empty lines before I remove the HTML tags and the spaces.

I do:

$m = [IO.File]::ReadAllText("$emailFolder\$fName")
$m = $m -replace "<((?!@).)*?>" # removes all html tag but not adr: <[email protected]>
$m = $m -replace "&nbsp;"," "
$m = $m.Replace('  ',' ').Replace('  ',' ').Replace('  ',' ')
$m = $m.Replace('`r','').Replace('`n`n','`n').Replace('`n`n','`n') # does nothing :(

I tried various version, none of them removed the empty lines. Any idea, how I can achieve that?

Beside that I tried to use the regex multiplier to find spaces in a row and failed.

What I'm doing wrong?

$m = $m.Replace(' +',' ')  # does not work
$m = $m.Replace('\s+',' ') # does not work either

Upvotes: 6

Views: 16421

Answers (6)

Edd
Edd

Reputation: 354

You need to include the flag: -Raw

$m = Get-Content "$emailFolder\$fName" -Raw #<- You need to include this
$m = $m -creplace '\s+', ' '

Upvotes: 0

JohnyV
JohnyV

Reputation: 123

I know this is an old post however I found another post that has an easier method and others may benefit. Your array that you imported using get-content for example

$array = Get-content C:\list.txt

$array displays

Name 1
Name 2

Name 3

Name 4

Do this...

$array = $array | where-object {$_}

This will output as you were after.

Source is http://techibee.com/powershell/remove-empty-items-from-array-in-powershell/2431

Upvotes: 1

briantist
briantist

Reputation: 47832

If I understand you correctly, you don't want to remove all line breaks, just "empty" lines (lines that consist of nothing but whitespace).

Consider this sample string:

$multiLine = "Line 1`r`nLine 2`nLine 3`r`n`r`n  `n `t `r`nLine 7`r`n"

When displayed, it will look like this on screen:

Line 1
Line 2
Line 3



Line 7

Line 4 is actually a blank line, with nothing but a CRLF. Line 5 is a space followed by a single LF, Line 6 is a space, a tab, a space, then a CRLF. I mixed line endings because HTML can be a mess; it's good to be prepared for anything!

To handle all of these, you can do a replace like this:

$multiLine -creplace '(?m)^\s*\r?\n',''

What Does This Do?

  1. -creplace is just the case-sensitive version of -replace (I like to be explicit).
  2. (?m) is an inline way to set regular expression modes. The m mode stands for multi-line, and it lets the ^ and $ anchors match the beginning/end of each line in a string (rather than the beginning and end of the string). This is the key to your issue, I think.
  3. We're using ^ to match the beginning of each line, then matching 0 or more whitespace using the \s class, which includes tab.
  4. We're matching an optional carriage return (for Windows line breaks), followed by a line break. We don't need to match multiples of these because ^ will catch them throughout the string.

The Resulting Output

Line 1
Line 2
Line 3
Line 7

Upvotes: 21

Mekalikot
Mekalikot

Reputation: 327

This works on me (what I mean is using the -replace).

$message.Body = (Get-Content "C:\Documents\Folder\email.txt") | ForEach-Object {
        $_ -replace ('\[NAME\]' , $name)`
           -replace ('\[AGE\]'  , $age)`
           -replace ('\[CITY\]' , $city)`
           -replace ('\[STATE\]' , $state)`
           -replace ('\[POSTAL\]' , $postal)


        }

Upvotes: 0

mjolinor
mjolinor

Reputation: 68301

This seems to work:

$m -replace '(?ms)(?:\r|\n)^\s*$'

Upvotes: 3

Knuckle-Dragger
Knuckle-Dragger

Reputation: 7046

You are passing the backtick inside single quotes, I got the same failure/result until I tried double quotes. I believe the problem lies in how the backtick is parsed while inside single quotes as opposed to not being parsed when from double quotes.

I'll say this is a feature and not a bug.

$m = "`r`n`n`r`r`n`r`n"

$m = $m.Replace("`r",'')
$m = $m.Replace("`n",'')

$m

Upvotes: 1

Related Questions