Remove blank lines in a file using sed

France  211 55  Europe

Japan   144 120 Asia
Germany 96  61  Europe

England 94  56  Europe




Taiwan  55  144 Asia
North Korea 44  2134    Asia

The above is my data file.

There are empty lines in it.

There are no spaces or tabs in those empty lines.

I want to remove all empty lines in the data.

I did a search Delete empty lines using SED has given the perfect answer.

Before that, I wrote two sed code myself:

sed -r 's/\n\n+/\n/g' cou.data
sed 's/\n\n\n*/\n/g' cou.data

And I tried awk gsub, not successful either.

awk '{ gsub(/\n\n*/, "\n"); print }' cou.data

But they don't work and nothing changes.

Where did I do wrong about my sed code?

Upvotes: 8

Views: 9927

Answers (2)

datak
datak

Reputation: 19

RobC's answer is great if your lines are terminated by newline (linefeed or \n) only, because SED separates lines that way. If your lines are terminated by \r\n (or CRLF) - which you may have your reasons for doing even on a unix system - you will not get a match, because from sed's perspective the line isn't empty - the \r (CR) counts as a character. Instead you can try:

sed '/^\r$/d' filename

Explanation:

  • ^ matches the start of the line
  • \r matches the carriage return
  • $ matches the end of the line
  • d deletes the selected line(s).
  • filename is the path to the input file.

Upvotes: 1

RobC
RobC

Reputation: 25042

Use the following sed to delete all blank lines.

sed '/./!d' cou.data

Explanation:

  • /./ matches any character, including a newline.
  • ! negates the selector, i.e. it makes the command apply to lines which do not match the selector, which in this case is the empty line(s).
  • d deletes the selected line(s).
  • cou.data is the path to the input file.

Where did you go wrong?

The following excerpt from How sed Works states:

sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.

When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed.8 Then the next cycle starts for the next input line.

I've intentionally emboldened the parts which are pertinent to why your sed examples are not working. Given your examples:

  • They seem to disregard that sed reads one line at a time.
  • The trailing newlines, (\n\n and \n\n\n in your first and second example respectively), which you're trying to match don't actually exist. They've been removed by the time your regexp pattern is executed and then reinstated when the end of the script is reached.

Upvotes: 10

Related Questions