cliff_osborn
cliff_osborn

Reputation: 11

How do I extract a matching string from a file using grep sed or awk

I have a file with lines like the following:

34:125 29:215 50:208
33:125 28:215 49:208
32:125 27:215 48:208

I want to extract all the entries for values xx:215. So the result should be

29:215
28:215
27:215

Please help with the syntax to resolve this. I've tried some grep commands without success. Thanks!

I've tried using various versions of grep but it didn't give the results I wanted. Output should be all entries in the file with xx:215. Getting the numbers in front of the colon is the issue since they vary.

Upvotes: -4

Views: 107

Answers (3)

Ed Morton
Ed Morton

Reputation: 204446

Using any POSIX awk and assuming by xx you mean 2 digits, not any 2 characters, this will print every complete (i.e. not a substring) space-separated string that's exactly 2 digits followed by :215 from any column in every row of your input:

$ awk '{for (i=1; i<=NF; i++) if ($i ~ /^[0-9]{2}:215$/) print $i}' file
29:215
28:215
27:215

It's always much easier to write a script to match the strings you want than to not match similar strings you don't want so here's some more comprehensive sample input and, I assume, expected output to test with:

$ cat file
34:215 29:215 50:208
33:125 28:2151 A9:215
32:125 7:215 48:208
32:125 27:215 48:215
32:125 127:215 48:208
32:125 $27:215 48:208

$ awk '{for (i=1; i<=NF; i++) if ($i ~ /^[0-9]{2}:215$/) print $i}' file
34:215
29:215
27:215
48:215

Note that it prints the desired xx:215 values from every column of every line as you asked for:

I want to extract all the entries for values xx:215

and, more importantly, it does not print [any part of] the x:215, xxx:215, or xx:215x values which you did not ask for, nor the case where it's xx:215 but one of the xs is not a digit.

You could do the same with GNU grep (for -o and \</\> word boundaries) if your input can't contain undesirable strings like $21:215 (which I included in the input and so is causing the final value below to print):

$ grep -Eo '\<[0-9]{2}:215\>' file
34:215
29:215
27:215
48:215
27:215

but then it's slightly less robust for different undesirable input, less portable and harder to build on if you have additional requirements in future.

Upvotes: 0

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185620

With grep:

grep -oP '\d+:215\b' file
29:215
28:215
27:215

If your grep implementation is missing -P:

grep -o '[0-9]\+:215\b' file

  • \d+ is the PCRE short for [0-9]+
  • \b is word boundaries

-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

-P, --perl-regexp
Interpret PATTERNS as Perl-compatible regular expressions (PCREs). This option is experimental when combined with the -z (--null-data) option, and grep -P may warn of unimplemented features.


With Perl:

perl -lne 'print $& if /\d+:215\b/' file
29:215
28:215
27:215

Upvotes: 2

jhnc
jhnc

Reputation: 16819

A sed version:

<file sed 'y/ /\n/' | sed '/:215/!d'
  • convert space to newline
  • discard lines that don't have desired rhs

Upvotes: 0

Related Questions