GopiGowtham
GopiGowtham

Reputation: 71

unix awk repeated pattern

I have the following variable. i want to search with pattern "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"

export str='16/02/02 11:29:22 INFO mortbay.log: State being saved: {"@class":"com.paypal.fpti.hadoop.copy.FPTICopyState","timestamp":0,"state":"Running","name":"com.paypal.fpti.hadoop.copy.FPTICopyState","id":"99c7cba7-d211-4845-97a1-c34168a91b22","subStates":{"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-4_2016/02/02/10/":{"@class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"99034acb-cfad-41a0-89ed-e2731b1f82ec","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-4","sourceDir":"/fpti/v2/hdfs_writer_4//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data_2016/02/02/10/":{"@class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"40325dec-0fe2-4025-8258-f896f957ddf0","subStates":null,"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data","sourceDir":"/fpti/v2/hdfs_writer//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-1_2016/02/02/10/":{"@class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5216f8c1-2cfa-4eac-a390-f4d2bcd6584f","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-1","sourceDir":"/fpti/v2/hdfs_writer_1//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-2_2016/02/02/10/":{"@class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"5fcd0b6e-3df9-4f82-a76f-bc8ff1493623","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-2","sourceDir":"/fpti/v2/hdfs_writer_2//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-3_2016/02/02/10/":{"@class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"6ec9223a-fcf0-447a-b9ae-2020e3232f6d","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-3","sourceDir":"/fpti/v2/hdfs_writer_3//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"},"com.paypal.fpti.hadoop.copy.CopyToLocalJob_fpti-raw-data-5_2016/02/02/10/":{"@class":"com.paypal.fpti.hadoop.copy.CopyToJobState","timestamp":0,"state":"Stopped","name":"com.paypal.fpti.hadoop.copy.CopyToJobState","id":"d123742c-8a55-4e25-bfa0-0a97f6ed25d7","subStates":{},"instanceState":"PostDone","window":"2016-02-02T10:00:00.000Z","datasetname":"fpti-raw-data-5","sourceDir":"/fpti/v2/hdfs_writer_5//2016/02/02/10/","localDir":"/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"}},"copystate":"CopyToLocalDone","start":"2016-02-02T11:21:24.678Z","end":null,"window":"2016-02-02T10:00:00.000Z","retryCount":0}'

I tried like below it gives the first occurence alone

[ggangadharan@phxbastion2 ~]$ echo $str | awk '{match($0, "/x/home[/,a-z,0-9,_]+*", a)}END{print a[0]}'
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//

but i want output like below.

/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

Can somebody help me how to use awk for this scenario?

thanks in advance

Upvotes: 2

Views: 225

Answers (3)

peak
peak

Reputation: 116880

Here are two approaches using a JSON-aware command-line tool, here jq.

In both cases we assume that the string of interest is embedded in the JSON object contained in $str

(1) In the following, we simply pretty-print the JSON object and grep for the string of interest in case it appears in a surprising spot. Further trimming of the result can easily be done (e.g. using sed) as desired:

$ sed 's/^[^{]*//' <<< "$str" | jq '.[]' | fgrep /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//"
"localDir": "/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//"

(2) The following query is appropriate if we are only interested in a match if it occurs in an object as a value corresponding to the key "localDir":

sed 's/^[^{]*//' <<< "$str" |
  jq -r '..
      | select(.localDir?)
      | .localDir
      | select(test("/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/"))'

/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

Upvotes: 0

e0k
e0k

Reputation: 7161

Using "significant splitting" in AWK:

$ awk -v RS="\"" '/\/x\/home\/pp_dt_fpti_batch\/stampy_copy_orchestration\//' <<< "$str"

which gives

/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

You specified /x/home/pp_dt_fpti_batch/stampy_copy_orchestration/ for your search pattern, so I used that. If you want something different, use something different.

This separates input into records by a quote " (set RS to ", escaped in the shell). Any record matching the regular expression is printed. Input is given from the shell with the string $str. Maybe this is more readable:

$ awk -v RS='"' '/regexp/' <<< "$str"

Upvotes: 2

ferdy
ferdy

Reputation: 7716

I'm not sure how to hack this in awk, but you can safely use egrep here:

$ echo $str | egrep -o /x/home[/,a-z,0-9,_]+*
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_4//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_1//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_2//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_3//2016/02/02/10//
/x/home/pp_dt_fpti_batch/stampy_copy_orchestration/tmp_5//2016/02/02/10//

Upvotes: 2

Related Questions