Alexandre Santos
Alexandre Santos

Reputation: 8338

Finding a string after grep

I have this file:

a=1 b=2 1234j12342134h d="a v" id="y_123456" something else 
a=1 b=2 1234j123421341 d="a" something else 
a=1 b=2 1234j123421342 d="a D v id=" id="y_123458" something else 
a=1 b=2 1234j123421344 d="a  v" something else 
a=1 b=2 1234j123421346 d="a.a." id="y_123410" something else 

and I want to retrieve only the lines that contain 'id=', and only the value for id and the 3rd column. The final product should be

1234j12342134h id="y_123456" 
1234j123421342 id="y_123458"
1234j123421346 id="y_123410"

or

1234j12342134h "y_123456" 
1234j123421342 "y_123458"
1234j123421346 "y_123410"

or even

1234j12342134h y_123456 
1234j123421342 y_123458
1234j123421346 y_123410

I tried a grep -o for the begin and end of the expression, but that misses the first block of ids. I tried awk, but that fails for columns with spaces.

I got it working with Java, but it is slow as the log files get bigger.

How can I do it using bash utilities?

Upvotes: 2

Views: 51

Answers (2)

B98
B98

Reputation: 1239

Using the Unix shell only, perhaps mistaking bash utilities for just builtins (on my part), its read command can split every line into field variables of your choice, based on the input field separator IFS (blank, by default). For example, processing only your first line in a test case,

$ echo a=1 b=2 1234j12342134h d="a v" id="y_123456" something else | \
  if read ign1 ign2 f3 ign4 ign5 f6 rest
    then echo $f3 $f6;
  fi
1234j12342134h id=y_123456
$

You could go from here to cat and a while loop, reading all the lines, and handling each according to its structure. (Note that in the way above, you'll loose the quote characters, because they are interpreted by the shell.) Handling the pieces can become rather complex, requiring further commands and conditionals.

Therefore, better options would include using awk or Perl, with the string processing logic adapted from your Java solution. In any solution, splitting input at certain places in each line seems a good first step, since a single, all-encompassing regular expression for grep would seem rather tricky.

Upvotes: -1

Ed Morton
Ed Morton

Reputation: 204406

With GNU awk (for 3rd arg for match()):

$ gawk 'match($0,/id="[^" ]+"/,a){ print $3, a[0] }' file
1234j12342134h id="y_123456"
1234j123421342 id="y_123458"
1234j123421346 id="y_123410"

WIth other awks:

$ awk 'match($0,/id="[^" ]+"/){ print $3, substr($0,RSTART,RLENGTH) }' file
1234j12342134h id="y_123456"
1234j123421342 id="y_123458"
1234j123421346 id="y_123410"

or if you want to strip some of the leading/trailing chars a couple of ways would be:

$ gawk 'match($0,/id="([^" ]+)"/,a){ print $3, a[1] }' file
1234j12342134h y_123456
1234j123421342 y_123458
1234j123421346 y_123410

or:

$ awk 'match($0,/id="[^" ]+"/){ print $3, substr($0,RSTART+4,RLENGTH-5) }' file
1234j12342134h y_123456
1234j123421342 y_123458
1234j123421346 y_123410

Upvotes: 5

Related Questions