Reputation: 486

Re-index two digit strings based on occurrence of a common string

I have a urlwatch .yaml file that has this format:

name: 01_urlwatch update released
url: "https://github.com/thp/urlwatch/releases"
filter:
  - xpath:
      path: '(//div[contains(@class,"release-timeline-tags")]//h4)[1]/a'
  - html2text: re
---
name: 02_urlwatch webpage
url: "https://thp.io/2008/urlwatch/"
filter: 
  - html2text: re
  - grep: (?i)current\sversion  #\s Matches a whitespace character
  - strip # Strip leading and trailing whitespace 
---
name: 04_RansomWhere? Objective-See
url: "https://objective-see.com/products/ransomwhere.html"
filter:
  - html2text: re
  - grep: (?i)current\sversion #\s Matches a whitespace character
  - strip #Strip leading and trailing whitespace
---
name: 05_BlockBLock Objective-See
url: "https://objective-see.com/products/blockblock.html"
filter:
  - html2text: re
  - grep: (?i)current\sversion #(?i) \s 
  - strip #Strip leading and trailing whitespace
---

I need to "re-index" the two digit number depending on the occurrence of name: . In this example the first and second occurrence of name: are followed by the correct index numbers but the third and fourth are not.

In the example above the third and fourth occurrence of name: would have their index number re-indexed to have 03_ and 04_ before the text string. That is: a two digit index number, and an underscore.

Also, there are instances of this string #name: which should not be counted in the re-indexing. (They have been commented out so those lines are not acted upon by urlwatch)

I tried using sed but had trouble with generating an index number based on occurrence of the string. I don't have GNU sed but can install if that is the only method.

Upvotes: 2

Answers (3)

Enlico

Reputation: 28366

I think this could be ok:

awk '/^name: / { sub(/[0-9]{2}/, ++i); sub(/ [1-9][^0-9]/,"\x0&"); sub(/\x0 /," 0") }; 1' your_input

On every line starting with name: , we substitute the double digit ([0-9]{2}) with a number i after incrementing it (it starts from undefined, i.e. from 0, so the first time we increment it we get 1); with another substitution we mark the line if if there's a one digit number only, and with a third substitution we add a leading 0 and remove the mark.

Probably it's a bit fragile, but given your explanation, it looks fine.

Upvotes: 3

thanasisp

Reputation: 5965

awk '/^name/{sub(/[0-9]{2}/,sprintf("%02d", ++c))}1' file

For any line starting with "name" we replace the first 2-digit number with our counter, which increments on every occurrence, with the help of the GNU awk sprintf function to print it with leading zeros when needed.

Upvotes: 2

potong

Reputation: 58351

This might work for you (GNU sed):

sed -E '/^name:/{x;s/.*/expr & + 1/e;s/^.$/0&/;x;G;s/[0-9]+(.*)\n(.*)/\2\1/}' file

Match on a line beginning name:, increment a counter in the hold space, append the hold space to the pattern space, match on first set of integers and using captured groups substitute the counter.

Upvotes: 3

Re-index two digit strings based on occurrence of a common string

Answers (3)

Related Questions