grzegorz9922
grzegorz9922

Reputation: 21

regex101 vs SED

I am quite new in Linux SED, and I need to translate eg. below string:

5069 ;08 Aug 00:00;0

to

5069 ;08 Aug 2019 00:00:00;0

using SED.

I tested regrex on regex101.com webpage, but in SED it seems not to work correctly (I have used -r, --regexp-extended option).

reqular expression:

(\s.*\d.*\s;)(\d\d)\s(Aug)\s(\d\d:\d\d)(;\d)

substitution:

\1\2 \3 2019 \4:00\5

result on webpage (OK)

 5069 ;08 Aug 2019 00:00:00;0

But in bash is NOK.

echo "   5069 ;08 Aug 00:00;0" | sed -r 's/(\s.*\d.*\s;)(\d\d)\s(Aug)\s(\d\d:\d\d)(;\d)/\1\2 \3 2019 \4:00\5/g'

5069 ;08 Aug 00:00;0

What I am doing wrong? Thanks for any help.

Upvotes: 1

Views: 2894

Answers (2)

David C. Rankin
David C. Rankin

Reputation: 84561

awk is your friend, your translation can be handled by:

awk '{$3=$3" 2019"; $4="00:"$4}1'

Example Use/Output

$ echo "5069 ;08 Aug 00:00;0" | awk '{$3=$3" 2019"; $4="00:"$4}1'
5069 ;08 Aug 2019 00:00:00;0

Explanation

awk allows you to operate on the fields present in each line of input (default: space delimited). awk starts counting fields at 1. So above the string to be modified is piped to awk on stdin and then awk can modify the input with the following rule (what is between {...}):

  • $3=$3" 2019" - uses string concatenation to add " 2019" to the 3rd field;
  • $4="00:"$4 - prepends "00:" to the beginning for the 4th field; and
  • 1 at the end after the rule is shorthand for print the record.

Resulting in your desired string.

Note: you can have as many rules as you like that will be applied in the order listed.

Upvotes: 2

azbarcea
azbarcea

Reputation: 3657

I would try something like:

$ echo " 5069 ;08 Aug 00:00;0" | sed -r 's/;([0-9]{2} [A-Z][a-z]{2}) ([0-9]{2}:[0-9]{2});/;\1 2019 \2;/g'
 5069 ;08 Aug 2019 00:00;0

The reason why \d is not match was answered in how to match digits in regex:

\d and \w don't work in POSIX regular expressions, you could use [:digit:] though

Upvotes: 0

Related Questions