jml
jml

Reputation: 1806

regex matching components of string and repositioning

I have a number of strings that I would like to search and reformat in a file. I'm using gsed v4.7 on MacOS 10.14.6 to do this. My goal is to break the strings up into backreferences so that I can then reformat.

Here is a single example of a candidate being transformed:

vib.h.p.a#3.synt 8

would become

vib.h.p.a#3.8.synt

...note that the number 8 is removed from the end and spliced between #3 and synt, separated by dots.

Here is a list of candidates:

vib.h.p.f2.synt 4
vib.h.p.g#2.synt 7
vib.h.p.a#3.synt 8

If you look at the components of this exemplary string, they can be broken down into groups fairly easily.

I cannot find a way to formalize this into an expression that matches the needs of gsed.

Here is what I have tried:

gsed -r 's/(vib\.+)\.(.+)\s(\d)/\1.\3.\2/g' myfile.txt

gsed -r 's/vib\.(.*)\.(.*)\s(\d)/vib.\1\3\2/g' myfile.txt

gsed -r 's/(vib\..*)\.(.*)\s(\d)/\1.\3.\2/g' myfile.txt

I know that I'm missing something critical, possibly a way to lookahead negatively? My intuition tells me that I am close to a solution, although I've given up for the night.

EDIT 12/16/19 - The answer below by @Wiktor suggested a command like

gsed -r 's/(vib.+)\.(.+)[[:blank:]]+([0-9]+)/\1.\3.\2/g' myfile.txt

This does not print the desired transformation on my machine. Instead, it prints the original text without any substitutions, as it is not matching successfully. I am unable to test on another machine, so I do not know if this is the correct answer, but I have tried all variants suggested, including using [[:space:]], [[:blank:]], [0-9], and + instead of *. If anyone can help I would appreciate it.

Upvotes: 1

Views: 142

Answers (4)

ghoti
ghoti

Reputation: 46856

This seems like a simple one to me. What am I missing?

echo "vib.h.p.f2.synt 4" | sed -E 's/(.*[0-9]+)(\.[^0-9]+) ([0-9]+)$/\1.\3\2/g'
vib.h.p.f2.4.synt

Note that this was done with stock sed in macOS, where -E gets you ERE.

Note also that this could be done using character classes, like this:

... sed -E 's/(.*[[:digit:]]+)(\.[^[:digit:]]+) ([[:digit:]]+)$/\1.\3\2/g'

But if you need to use character classes, you probably already know that. :)

Upvotes: 0

CinCout
CinCout

Reputation: 9619

Use this regex:

([.#0-9a-zA-Z]+\.)(\S*)\s+([0-9]+)

and replace with $1$3.$2

Demo

Upvotes: 1

jml
jml

Reputation: 1806

I think I finally found something that does the replacement I was hoping for.

gsed -r 's/(vib.\w.)(\w+.(\w[0-9]|\w\#[0-9]).)(\w+)\s([0-9])/\1\2\5.\4/g' myfile.txt

This works for my needs, but there is probably a be a far more elegant way. I'm including the text I used as a test here, in the event that someone can figure out what a better solution would be.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You may use

gsed -r 's/(vib.+)\.(.+)[[:blank:]]+([0-9]+)/\1.\3.\2/g' myfile.txt

The main points:

  • \.+ matches one or more dots, not any one or more chars, hence you need to remove the backslash
  • \d and \s are not quite portable and thus it makes sense to replace \d with [0-9] and \s with a space or [[:blank:]]
  • If there are more digits than one in Group 3, you may end up with a part of a number being swapped, add + (since you use -r option the POSIX ERE syntax will treat + as a one or more occurrences quantifier).

Upvotes: 0

Related Questions