jerry
jerry

Reputation: 355

extract a string after a pattern

I want to extract the numbers following client_id and id and pair up client_id and id in each line.

For example, for the following lines of log,

User(client_id:03)) results:[RelatedUser(id:204, weight:10),_RelatedUser(id:491,_weight:10),_RelatedUser(id:29, weight: 20)

User(client_id:04)) results:[RelatedUser(id:209, weight:10),_RelatedUser(id:301,_weight:10)

User(client_id:05)) results:[RelatedUser(id:20, weight: 10)

I want to output

03 204
03 491
03 29
04 209
04 301
05 20

I know I need to use sed or awk. But I do not know exactly how.

Thanks

Upvotes: 7

Views: 3169

Answers (4)

potong
potong

Reputation: 58558

This might work for you (GNU sed):

sed -r '/.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d;s//\2 \3\n\1/;P;D' file
  • /.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d if the line doesn't have the intended strings delete it.
  • s//\2 \3\n\1/ re-arrange the line by copying the client_id and moving the first id ahead thus reducing the line for successive iterations.
  • P print upto the introduced newline.
  • D delete upto the introduced newline.

Upvotes: 3

Thor
Thor

Reputation: 47219

I would prefer awk for this, but if you were wondering how to do this with sed, here's one way that works with GNU sed.

parse.sed

/client_id/ {
  :a
  s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/
  ta
  s/^[^\n]+\n//
}

Run it like this:

sed -rf parse.sed infile

Or as a one-liner:

<infile sed '/client_id/ { :a; s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/; ta; s/^[^\n]+\n//; }'

Output:

03 204
03 491
03 29

04 209
04 301

05 20

Explanation:

The idea is to repeatedly match client_id:([0-9]+) and id:([0-9]+) pairs and put them at the end of pattern space. On each pass the id:([0-9]+) is removed.

The final replace removes left-overs from the loop.

Upvotes: 2

sampson-chen
sampson-chen

Reputation: 47367

Here's a awk script that works (I put it on multiple lines and made it a bit more verbose so you can see what's going on):

#!/bin/bash

awk 'BEGIN{FS="[\(\):,]"}
/client_id/ {
cid="no_client_id"
for (i=1; i<NF; i++) {
    if ($i == "client_id") {
        cid = $(i+1)
    } else if ($i == "id") {
        id = $(i+1);
        print cid OFS id;
    }
 }
}' input_file_name

Output:

03 204
03 491
03 29
04 209
04 301
05 20

Explanation:

  • awk 'BEGIN{FS="[\(\):,]"}: invoke awk, use ( ) : and , as delimiters to separate your fields
  • /client_id/ {: Only do the following for the lines that contain client_id:
  • for (i=1; i<NF; i++) {: iterate through the fields on each line one field at a time
  • if ($i == "client_id") { cid = $(i+1) }: if the field we are currently on is client_id, then its value is the next field in order.
  • else if ($i == "id") { id = $(i+1); print cid OFS id;}: otherwise if the field we are currently on is id, then print the client_id : id pair onto stdout
  • input_file_name: supply the name of your input file as first argument to the awk script.

Upvotes: 4

Steve
Steve

Reputation: 54572

This may work for you:

awk -F "[):,]" '{ for (i=2; i<=NF; i++) if ($i ~ /id/) print $2, $(i+1) }' file

Results:

03 204
03 491
03 29
04 209
04 301
05 20

Upvotes: 5

Related Questions