James King
James King

Reputation: 1614

Using Sed to delete lines which contain non alphabets

The following Regex works as expected in Notepad++:

^.*[^a-z\r\n].*$

However, when I try to use it with sed, it wont work.

sed -r 's/\(^.*[^a-z\r\n].*$\)//g' wordlist.txt

Upvotes: 0

Views: 1516

Answers (2)

AdrianHHH
AdrianHHH

Reputation: 14038

Two things:

Sed is a stream editor. It processes one line of the input at a time. That means the search and replace commands, etc, can only see the current line. By contrast, Notepad++ has the whole file in memory and so its search expressions can span two or more lines.

Your command sed -r 's/\(^.*[^a-z\r\n].*$\)//g' wordlist.txt includes \( and \). These mean real (ie non-escaped) round brackets. So the command says find a line that starts with a ( and ends with a ) with some other characters between and replace it with nothing. Rewriting the command as sed -r 's/^.*[^a-z\r\n].*$//g' wordlist.txt should have the desired effect. You could also remove the \r\n to give sed -r 's/^.*[^a-z].*$//g' wordlist.txt. But neither of these will be exactly the same as the Notepad++ command as they will leave empty lines. So you may find the command sed -r '/^.*[^a-z].*$/d' wordlist.txt is closer to what you really want.

Upvotes: 1

beny23
beny23

Reputation: 35018

You could use:

sed -i '/[^a-z]/d' wordlist.txt

This will delete each line that has a non-alphabet character (no need to specify linefeeds)

EDIT:

You regex doesn't work because you are trying to match

( bracket
^ beginning of line
...
$ end of line
) bracket

As you won't have a bracket and then the beginning of the line, your regex simply doesn't match anything.

Note, also an expression of

s/\(^.*[^a-z\r\n].*$\)//g'

wouldn't delete a line but replace it with a blank line

EDIT2:

Note, in sed using the -r flag changes the behaviour of \( and \) without the -r flag they are group indicators, but with the -r flag they're just brackets...

Upvotes: 2

Related Questions