Mark
Mark

Reputation: 6464

sed: filter string subset from lines matching regexp

I have a file of the following format:

abc: A B C D E
abc: 1 2 3 4 5 
def  D E F G H
def: 10 11 12 23 99
...

That is a first line with strings after ':' is a header for the next line with numbers. I'd like to use sed to extract only a line starting with PATTERN string with numbers in the line.

Number of numbers in a line is variable, but assume that I know exactly how many I'm expecting, so I tried this command:

% sed 's/^abc: \([0-9]+ [0-9]+ [0-9]+\)$/\1/g' < file.txt

But it dumps all entries from the file. What am I doing wrong?

Upvotes: 0

Views: 1917

Answers (4)

Ed Morton
Ed Morton

Reputation: 203189

With any sed:

$ sed -n 's/^abc: \([0-9 ]*\)$/\1/p' file
1 2 3 4 5

Upvotes: 0

Stephen P
Stephen P

Reputation: 14800

With @Mark's additional question in a comment "If I want to just extract the matched numbers (and remove prefix, e.g, abc)…" this is the pattern I came up with:

sed -En 's/^abc: (([0-9]+[ \t]?)+)[ \t]*$/\1/gp' file.txt

I'm using the -E flag for extended regular expressions to avoid all the escaping that would be needed.
Given this file:

abc: A B C D E
abc: 1 2 3 4 5 
abc: 1 c9 A 7f
def  D E F G H
def: 10 11 12 23 99

… this regex matches abc: 1 2 3 4 5 while excluding abc: 1 c9 A 7f — it also allows variable whitespace and trailing whitespace.

Upvotes: 1

Quas&#237;modo
Quas&#237;modo

Reputation: 4004

  1. sed does substitutions and prints each line, whether a substitution happens or not.

  2. Your regular expression is wrong. It would match only three numbers separated by spaces if extended regex flag was given (-E). Without it, not even that, because the + sign will be interpreted literally.

  3. The best here is to use addresses and only print lines that have a match:

sed -nE '/^abc: [0-9]+ [0-9]+ [0-9]+ [0-9]+ [0-9]+$/p' < file.txt

or better,

sed -nE '/^abc:( [0-9]+){5}$/p' < file.txt

The -n flag disables the "print all lines" behavior of sed described in (1). Only the lines that reach the p command will be printed.

Upvotes: 1

James Brown
James Brown

Reputation: 37394

to extract only a line starting with PATTERN string with numbers in the line and Number of numbers in a line is variable means at least one number, so:

$ sed -n '/abc: \([0-9]\+\)/p' file

Output:

abc: 1 2 3 4 5 

With exactly 5 numbers, use:

$ sed -n '/abc: \([0-9]\+\( \|$\)\)\{5\}/p' file

Upvotes: 1

Related Questions