Reputation:
I've been working with regex on strings recently and I've hit a snag. You see, I'm trying to get this:
chocolatecakes
thecakeismine
cakessurpassexpectation
to do this:
chocolate_cakes
the_cake_ismine
cakes_surpassexpectation
However, when I use this:
#!/bin/sh
words_array=(is cake)
number_of_times=0
word_underscorer (){
echo $1 | sed -r "s/([a-z])($2)/\1_\2/g" | sed -r "s/($2)([a-z])/\1_\2/g"
}
for words_to_underscore in "${words_array[@]}"; do
if [ "$number_of_times" -eq 0 ]; then
first=`word_underscorer "chocolatecakes" "$words_to_underscore"`
second=`word_underscorer "thecakeismine" "$words_to_underscore"`
third=`word_underscorer "cakessurpassexpectation" "$words_to_underscore"`
else
word_underscorer "$first" "$words_to_underscore"
word_underscorer "$second" "$words_to_underscore"
word_underscorer "$third" "$words_to_underscore"
fi
echo "$first"
echo "$second"
echo "$third"
done
I get this:
chocolate_cake_s
the_cake_ismine
cake_ssurpassexpectation
I'm not sure how to fix this.
Upvotes: 0
Views: 64
Reputation: 58568
This might work for you (GNU sed):
sed -r 's/\B([^_])\B(cakes?|is)\B/\1_\2/g;s/(cakes?|is)\B([^_])\B/\1_\2/g' file
Insert an underscore infront/behind a particular word if the particular word is within another word and the character before/after the particular word is not an underscore.
Upvotes: 0
Reputation: 98118
If you write the words to a file (words
) then you can do something like this:
sed -e 's/\('$(sed ':l;N;s/\n/\\|/;bl' words )'\)/\1_'/g -e 's/_$//' input
This gives you:
chocolate_cakes
the_cake_ismine
cakes_surpassexpectation
The main point is to construct this sed command:
sed -e s/\(chocolate\|cake\|the\|cakes\)/\1_/g -e s/_$// input
Upvotes: 1
Reputation: 47282
Based on what you've shown you could do something such as:
sed -r -e "s/($2)/_\1_/g" -r -e "s/($2)_s|^($2)(_*)/\1s\2_/g" -r -e "s/^_|_$//g"
That should return the final result of:
chocolate_cakes
the_cake_ismine
cakes_surpassexpectation
The idea here is process by elimination; that is not to say that this method doesn't have potential issues — you'll hopefully understand what I mean below. Each sed
operation is labeled by number to help you see what is happening.
The sed
commands work on the array, which starts out with "is" and then "cake":
1. is -> _is_
2. is_s or is_ -> iss or is_
3. _is_ -> is
1. cake -> _cake_
2. cake_s or cake_ -> cakes or cake_
3. _cake_ -> cake
string one:
1. chocolatecakes -> chocolate_cake_s
2. chocolate_cake_s -> chocolate_cakes_
3. chocolate_cakes_ -> chocolate_cakes
string two:
1. thecake_is_mine -> the_cake_ismine
2. the_cake_ismine -> no change
3. the_cake_ismine -> no change
string three:
1. cakessurpassexpectation -> _cake_ssurpassexpectation
2. _cake_ssurpassexpectation -> _cakes_surpassexpectation
3. _cakes_surpassexpectation -> cakes_surpassexpectation
So you can see here what the issue might be with the "is" portion of the array; it could possibly get broken up perhaps in an undesired way during the sed
operation if somehow it ends up becoming "is_s" on operation number 2. This is where you'll want to test multiple combinations of your strings to ensure that you've covered all the possible scenarios you don't want. Once you've done that you can go back and refine the patterns as needed, or even further find ways to optimize things in a way that allows you to use less piped commands.
Upvotes: 1