Reputation: 45
Given date in the json file as "ts":"2021-04-23T13:11:57Z"
or "2021-05-05T07:22:54+05:00"
I want to read the string using grep.
Need help in forming the regex of the last part i.e the time zone.
My current command goes like
grep -Po '"ts":"\K([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-2][0-9]:[0-5][0-9]:[0-5][0-9]+Z
this works fine for the first format how do i modify it so that it works on both of the formats..
Upvotes: 3
Views: 1166
Reputation: 163277
For such a specific string, another option with a bit broader match could be
grep -Po '(?:"ts":)?"\K\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d(?:Z|[+-]\d\d:\d\d)(?=")' file
Explanation
(?:"ts":)?
Optionally match "ts":
"\K
Match "
and clear the match buffer (forget what is matched so far)\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d
Match a date time like pattern with a T
char in between(?:
Non capture group
Z
Match a Z
char|
Or[+-]\d\d:\d\d
Match +
or -
and 2 digits :
2 digits)
Close non capture group(?=")
Positive lookahead, assert "
directly to the rightOutput
2021-04-23T13:11:57Z
2021-05-05T07:22:54+05:00
Or using -E
for extended regular expressions (which will include the outer double quotes)
grep -Eo '("ts":)?"[0-9]{4}-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9](Z|[+-][0-9][0-9]:[0-9][0-9])"' ./file
Upvotes: 1
Reputation: 84551
You can use the following to parse either time string from the line. You will need to isolate the line beginning with "ts:"
first. For example the following grep
expression will do:
grep -Po '[0-9+TZ:-]{2,}'
Which simply extracts the string of characters made up of [0-9+TZ:-]
where there is a repetition of at least {2,}
.
Example Use
$ echo '"ts":"2021-04-23T13:11:57Z"' | grep -Po '[0-9+TZ:-]{2,}'
2021-04-23T13:11:57Z
and
$ echo '"ts":"2021-05-05T07:22:54+05:00"' | grep -Po '[0-9+TZ:-]{2,}'
2021-05-05T07:22:54+05:00
The normal caveats apply, you are better served using a json aware utility like jq
. That said, you can separate the values with grep
, but you must take care in isolating the line.
You can use sed
to isolate the line using the normal /match/s/find/replace/
form with a capture group and backreference. For example you can use:
sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
Which matches the line beginning with ^[[:blank:]]*"ts"
before extraction and the -n
suppresses the normal printing of pattern-space so that only the wanted text is output, e.g.
Example Use
$ echo '"ts":"2021-04-23T13:11:57Z"' | sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
2021-04-23T13:11:57Z
and
$ echo '"ts":"2021-05-05T07:22:54+05:00"' | sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
2021-05-05T07:22:54+05:00
Upvotes: 4
Reputation: 133458
With your shown samples with GNU grep
's PCRE option, you could try following regex to match both of the timings.
grep -oP '(?:"ts":)?"\d{4}-\d{2}-\d{2}T(?:[0-1][1-9]|2[0-4]):(?:[0-4][0-9]|5[0-9])[+:](?:[0-4][0-9]|5[0-9])(?:Z"|\+(?:[0-4][0-9]|5[0-9]):(?:[0-4][0-9]|5[0-9])")' Input_file
Explanation: Adding detailed explanation for above.
(?:"ts":)? ##In a non-capturing group matching "ts": keeping it optional here.
"\d{4}-\d{2}-\d{2}T ##Matching " followed by 4 digits-2digits-2digits T here.
(?: ##Starting 1st non-capturing group here.
[0-1][1-9]|2[0-4] ##Matching 0 to 19 and 20 to 24 here to cover 24 hours.
): ##Closing 1st non-capturing group followed by colon here.
(?: ##Starting 2nd non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 for mins here.
) ##Closing 2nd non-capturing group here.
[+:] ##Matching either + or : here.
(?: ##Starting 3rd capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 for seconds here.
) ##Closing 3rd non-capturing group here.
(?: ##Starting 4th non-capturing group here.
Z"|\+ ##Matching Z" OR +(literal character) here.
(?: ##Starting non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 here.
) ##Closing non-capturing group here.
: ##Matching colon here.
(?: ##Starting non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 here.
)" ##Closing non-capturing group here, followed by "
) ##Closing 4th non-capturing group here.
Upvotes: 5