Reputation: 1806

regex matching components of string and repositioning

I have a number of strings that I would like to search and reformat in a file. I'm using gsed v4.7 on MacOS 10.14.6 to do this. My goal is to break the strings up into backreferences so that I can then reformat.

Here is a single example of a candidate being transformed:

vib.h.p.a#3.synt 8

would become

vib.h.p.a#3.8.synt

...note that the number 8 is removed from the end and spliced between #3 and synt, separated by dots.

Here is a list of candidates:

vib.h.p.f2.synt 4
vib.h.p.g#2.synt 7
vib.h.p.a#3.synt 8

If you look at the components of this exemplary string, they can be broken down into groups fairly easily.

I cannot find a way to formalize this into an expression that matches the needs of gsed.

Here is what I have tried:

gsed -r 's/(vib\.+)\.(.+)\s(\d)/\1.\3.\2/g' myfile.txt

gsed -r 's/vib\.(.*)\.(.*)\s(\d)/vib.\1\3\2/g' myfile.txt

gsed -r 's/(vib\..*)\.(.*)\s(\d)/\1.\3.\2/g' myfile.txt

I know that I'm missing something critical, possibly a way to lookahead negatively? My intuition tells me that I am close to a solution, although I've given up for the night.

EDIT 12/16/19 - The answer below by @Wiktor suggested a command like

gsed -r 's/(vib.+)\.(.+)[[:blank:]]+([0-9]+)/\1.\3.\2/g' myfile.txt

This does not print the desired transformation on my machine. Instead, it prints the original text without any substitutions, as it is not matching successfully. I am unable to test on another machine, so I do not know if this is the correct answer, but I have tried all variants suggested, including using [[:space:]], [[:blank:]], [0-9], and + instead of *. If anyone can help I would appreciate it.

Upvotes: 1

Answers (4)

ghoti

Reputation: 46856

This seems like a simple one to me. What am I missing?

echo "vib.h.p.f2.synt 4" | sed -E 's/(.*[0-9]+)(\.[^0-9]+) ([0-9]+)$/\1.\3\2/g'
vib.h.p.f2.4.synt

Note that this was done with stock sed in macOS, where -E gets you ERE.

Note also that this could be done using character classes, like this:

... sed -E 's/(.*[[:digit:]]+)(\.[^[:digit:]]+) ([[:digit:]]+)$/\1.\3\2/g'

But if you need to use character classes, you probably already know that. :)

Upvotes: 0

CinCout

Reputation: 9619

Use this regex:

([.#0-9a-zA-Z]+\.)(\S*)\s+([0-9]+)

and replace with $1$3.$2

Demo

Upvotes: 1

jml

Reputation: 1806

I think I finally found something that does the replacement I was hoping for.

gsed -r 's/(vib.\w.)(\w+.(\w[0-9]|\w\#[0-9]).)(\w+)\s([0-9])/\1\2\5.\4/g' myfile.txt

This works for my needs, but there is probably a be a far more elegant way. I'm including the text I used as a test here, in the event that someone can figure out what a better solution would be.

Upvotes: 0

Wiktor Stribiżew

Reputation: 627082

You may use

gsed -r 's/(vib.+)\.(.+)[[:blank:]]+([0-9]+)/\1.\3.\2/g' myfile.txt

The main points:

\.+ matches one or more dots, not any one or more chars, hence you need to remove the backslash
\d and \s are not quite portable and thus it makes sense to replace \d with [0-9] and \s with a space or [[:blank:]]
If there are more digits than one in Group 3, you may end up with a part of a number being swapped, add + (since you use -r option the POSIX ERE syntax will treat + as a one or more occurrences quantifier).

Upvotes: 0

regex matching components of string and repositioning

Answers (4)

Related Questions