Nick Bull
Nick Bull

Reputation: 9866

Strange behaviour using `sed`

So, I'm trying to remove tabs after the numbers in this table I'm formatting from the command line. Below is the original table data, coipied and pasted directly from the file in question:

File Path                   Line  Description
/home/nick/.bashrc             9         # TODO        Chop this into code import files
/home/nick/.bashrc           204         # TODO        Add $HOME/os-setup to OS installation disc
/home/nick/.bashrc           207         # TODO        Custom power actions don't work; system tray notifications

When adding a final sed command to the pipe however, some strange behaviour occurs. As an example, consider the sed command below:

cat somefile.txt | column -tx -s : | sed -e 's/\([0-9]\{1,\}\)/\1/g'
File Path                   Line  Description
/home/nick/.bashrc             9         # TODO        Chop this into code import files
/home/nick/.bashrc           204         # TODO        Add $HOME/os-setup to OS installation disc
/home/nick/.bashrc           207         # TODO        Custom power actions don't work; system tray notifications

This finds the numbers in each row of the table, then replaces the match with the first part of the regular expression. As the whole match is wrapped in braces, this means that nothing changes as it's replaced by itself.

However, when I then try the same sed command, but I add the \t character, a literal tab, to the matching regex the sed output seems to truncate the number match also! See below:

cat somefile.txt | column -tx -s : | sed -e 's/\([0-9]\{1,\}\)\t/\1/g'
File Path                   Line  Description
/home/nick/.bashrc               # TODO        Chop this into code import files
/home/nick/.bashrc           20  # TODO        Add $HOME/os-setup to OS installation disc
/home/nick/.bashrc           20  # TODO        Custom power actions don't work; system tray notifications

Why does sed truncate the last digit from each number? How can I stop sed from doing this?

Upvotes: 0

Views: 46

Answers (1)

sjsam
sjsam

Reputation: 21955

Instead of removing the tab after the number I am removing the spaces before the # TODO.

awk(GNU) solution

awk '{print gensub(/[ ]+(  # TODO)/,"\\1","g",$0)} ' file

sed solution

sed -E 's/[ ]+# TODO/  # TODO/' file

Output

File Path                   Line  Description
/home/nick/.bashrc             9  # TODO        Chop this into code import files
/home/nick/.bashrc           204  # TODO        Add $HOME/os-setup to OS installation disc
/home/nick/.bashrc           207  # TODO        Custom power actions don't work; system tray notification

Assumption

The description always begin with a # TODO

Note

You may put desired number of spaces before the # TODO in selection. I put two.

Upvotes: 1

Related Questions