Lovy
Lovy

Reputation: 45

How to parse the following date using grep command in bash

Given date in the json file as "ts":"2021-04-23T13:11:57Z" or "2021-05-05T07:22:54+05:00" I want to read the string using grep.

Need help in forming the regex of the last part i.e the time zone.

My current command goes like grep -Po '"ts":"\K([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-2][0-9]:[0-5][0-9]:[0-5][0-9]+Z this works fine for the first format how do i modify it so that it works on both of the formats..

Upvotes: 3

Views: 1166

Answers (3)

The fourth bird
The fourth bird

Reputation: 163277

For such a specific string, another option with a bit broader match could be

grep -Po '(?:"ts":)?"\K\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d(?:Z|[+-]\d\d:\d\d)(?=")' file

Explanation

  • (?:"ts":)? Optionally match "ts":
  • "\K Match " and clear the match buffer (forget what is matched so far)
  • \d{4}-\d\d-\d\dT\d\d:\d\d:\d\d Match a date time like pattern with a T char in between
  • (?: Non capture group
    • Z Match a Z char
    • | Or
    • [+-]\d\d:\d\d Match + or - and 2 digits : 2 digits
  • ) Close non capture group
  • (?=") Positive lookahead, assert " directly to the right

Output

2021-04-23T13:11:57Z
2021-05-05T07:22:54+05:00

Or using -E for extended regular expressions (which will include the outer double quotes)

grep -Eo '("ts":)?"[0-9]{4}-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9](Z|[+-][0-9][0-9]:[0-9][0-9])"' ./file

Upvotes: 1

David C. Rankin
David C. Rankin

Reputation: 84551

You can use the following to parse either time string from the line. You will need to isolate the line beginning with "ts:" first. For example the following grep expression will do:

grep -Po '[0-9+TZ:-]{2,}'

Which simply extracts the string of characters made up of [0-9+TZ:-] where there is a repetition of at least {2,}.

Example Use

$ echo '"ts":"2021-04-23T13:11:57Z"' | grep -Po '[0-9+TZ:-]{2,}'
2021-04-23T13:11:57Z

and

$ echo '"ts":"2021-05-05T07:22:54+05:00"' | grep -Po '[0-9+TZ:-]{2,}'
2021-05-05T07:22:54+05:00

The normal caveats apply, you are better served using a json aware utility like jq. That said, you can separate the values with grep, but you must take care in isolating the line.

You can use sed to isolate the line using the normal /match/s/find/replace/ form with a capture group and backreference. For example you can use:

sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'

Which matches the line beginning with ^[[:blank:]]*"ts" before extraction and the -n suppresses the normal printing of pattern-space so that only the wanted text is output, e.g.

Example Use

$ echo '"ts":"2021-04-23T13:11:57Z"' | sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
2021-04-23T13:11:57Z

and

$ echo '"ts":"2021-05-05T07:22:54+05:00"' | sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
2021-05-05T07:22:54+05:00

Upvotes: 4

RavinderSingh13
RavinderSingh13

Reputation: 133458

With your shown samples with GNU grep's PCRE option, you could try following regex to match both of the timings.

grep -oP '(?:"ts":)?"\d{4}-\d{2}-\d{2}T(?:[0-1][1-9]|2[0-4]):(?:[0-4][0-9]|5[0-9])[+:](?:[0-4][0-9]|5[0-9])(?:Z"|\+(?:[0-4][0-9]|5[0-9]):(?:[0-4][0-9]|5[0-9])")' Input_file

Explanation: Adding detailed explanation for above.

(?:"ts":)?                ##In a non-capturing group matching "ts": keeping it optional here.
"\d{4}-\d{2}-\d{2}T       ##Matching " followed by 4 digits-2digits-2digits T here.
(?:                       ##Starting 1st non-capturing group here.
   [0-1][1-9]|2[0-4]      ##Matching 0 to 19 and 20 to 24 here to cover 24 hours.
):                        ##Closing 1st non-capturing group followed by colon here.
(?:                       ##Starting 2nd non-capturing group here.
   [0-4][0-9]|5[0-9]      ##Matching 00 to 59 for mins here.
)                         ##Closing 2nd non-capturing group here.
[+:]                      ##Matching either + or : here.
(?:                       ##Starting 3rd capturing group here.
   [0-4][0-9]|5[0-9]      ##Matching 00 to 59 for seconds here.
)                         ##Closing 3rd non-capturing group here.
(?:                       ##Starting 4th non-capturing group here.
   Z"|\+                  ##Matching Z" OR +(literal character) here.
   (?:                    ##Starting non-capturing group here.
     [0-4][0-9]|5[0-9]    ##Matching 00 to 59 here.
   )                      ##Closing non-capturing group here.
   :                      ##Matching colon here.
   (?:                    ##Starting non-capturing group here.
     [0-4][0-9]|5[0-9]    ##Matching 00 to 59 here.
   )"                     ##Closing non-capturing group here, followed by "
)                         ##Closing 4th non-capturing group here.

Upvotes: 5

Related Questions