Reputation: 1789
I'm not sure this is possible to do what I want in sed (or awk or any bash tool):
I want to make a script that replaces : )
in a string by <happy>
and ) :
by <sad>
. This can easily be done with sed with:
echo "test : )" | sed 's/: )/<happy>/g'
echo "test ) :" | sed 's/) :/<sad>/g'
Unfortunately, sometimes I have strings like these:
I'm happy : ) : ) : )
I'm sad ) : ) : ) :
In that case, the output should be:
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>
But by combining the two commands above:
echo "I'm happy : ) : ) : )" | sed 's/: )/<happy>/g' | sed 's/) :/<sad>/g'
echo "I'm sad ) : ) : ) :" | sed 's/: )/<happy>/g' | sed 's/) :/<sad>/g'
I will get:
I'm happy <happy> <happy> <happy>
I'm sad ) <happy> <happy> :
The way to solve this would be to do both replacements in parallel, by treating the string from left to right. I tried to use something like this: sed 's/a/b/g;s/c/d/g'
but the replacement is only done one pattern after one other, and doesn't solve the problem.
Upvotes: 9
Views: 1083
Reputation: 30910
We can solve this problem in two passes:
!
for both start and end, but you can use almost anything).Here's a sed program that implements this approach:
#!/bin/sed -f
s/) :\|: )/!&!/g
s/!: )!/<happy>/g
s/!) :!/<sad>/g
A note on the the delimiters:
We can use any delimiter we want for this, as we always re-match and replace the delimiters we introduce. This isn't the case in all sed scripts, and as a general rule it can be a good idea to use \n
as delimiter (if you're processing single lines) or another unlikely character (perhaps \0
or \377
if you're processing ordinary text).
We can use any character in this script. For example, using a
and b
works just as well:
#!/bin/sed -f
s/) :\|: )/a&b/g
s/a: )b/<happy>/g
s/a) :b/<sad>/g
$ sed -f ../stackoverflow/51886023.sed <<<$'I\'m happy : ) : ) : )\nI\'m sad ) : ) : ) :'
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>
Upvotes: 1
Reputation: 3671
If you have Perl available, it does a good job of this problem. Its e
option on substitutions makes the code short and - for Perl - tidy.
my %map = (
": )" => "<happy>",
") :" => "<sad>",
);
while (<>) {
s/\: \)|\) \:/$map{$&}/ge;
print;
}
The general case - where the regular expression is built from the map - is solved in the script below. The subtlety in Perl is that its regular expression engine matches the first matching pattern in an |
alternation. The upshot is that the alternatives need to be sorted longest to shortest, otherwise, in the example below, : ))
might get matched by : )
.
$ cat script.pl
#!/usr/bin/perl -w
use strict;
my %map = (
": )" => "<happy>",
") :" => "<sad>",
": |" => "<meh>",
": ))" => "<really happy>",
);
my @map_regexes = keys %map;
my @map_regexes_longest_first = reverse sort @map_regexes;
my @quoted_map_regexes = map(quotemeta, @map_regexes_longest_first);
my $map_regex = join("|", @quoted_map_regexes);
while (<>) {
s/$map_regex/$map{$&}/ge;
print;
}
$ cat file.txt
I'm happy : ) : ) : )
I'm sad ) : ) : ) :
I'm meh : | : | : |
I'm really happy : )) : )) : ))
$ perl -w script.pl <file.txt
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>
I'm meh <meh> <meh> <meh>
I'm really happy <really happy> <really happy> <really happy>
Upvotes: 4
Reputation: 23677
For given sample (i.e dealing with two overlapping matches), one can use looping and solve with sed
as well
$ cat ip.txt
I am happy : ) : ) : )
I am sad ) : ) : ) :
: ) : ) : )
) : ) : ) :
) : : ) :
: ) ) :
$ # GNU version: sed -E -e ':a s/(^|[^)].): \)/\1<happy>/g; ta' -e 's/\) :/<sad>/g'
$ sed -E -e ':a' -e 's/(^|[^)].): \)/\1<happy>/g' -e 'ta' -e 's/\) :/<sad>/g' ip.txt
I am happy <happy> <happy> <happy>
I am sad <sad> <sad> <sad>
<happy> <happy> <happy>
<sad> <sad> <sad>
<sad> <happy> :
<happy> <sad>
-e ':a'
label a
s/(^|[^)].): \)/\1<happy>/g
replace : )
with <happy>
as long as 2nd character before it is not )
-e 'ta'
branch to label a
if there was successful substitution - looping is required because we have to check 4 characters for one replacement of 2 characterss/\) :/<sad>/g
once all the happy emojis are replaced, we can change all the sad emojis in one go
For multiple mappings, here's a perl
solution similar to the awk
one
$ perl -pe 'BEGIN{ $h{": )"}="<happy>"; $h{") :"}="<sad>";
$r = join "|", map quotemeta, keys %h; }
s/$r/$h{$&}/g' ip.txt
I am happy <happy> <happy> <happy>
I am sad <sad> <sad> <sad>
<happy> <happy> <happy>
<sad> <sad> <sad>
<sad> <happy> :
<happy> <sad>
$h{": )"}="<happy>"
create hash of key-value pairs$r = join "|", map quotemeta, keys %h
create regex alternation from all the keys of hash %h
... map quotemeta
will escape all characters other than [A-Za-z_0-9]
for each hash keys/$r/$h{$&}/g
search and replaceUpvotes: 2
Reputation: 204164
With GNU awk for the 3rd arg to match():
$ cat script1.awk
BEGIN {
map[": )"] = "<happy>"
map[") :"] = "<sad>"
}
{
while ( match($0,/(.*)(: \)|\) :)(.*)/,a) ) {
$0 = a[1] map[a[2]] a[3]
}
print
}
$ awk -f script1.awk file
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>
With any awk:
$ cat script2.awk
BEGIN {
map[": )"] = "<happy>"
map[") :"] = "<sad>"
}
{
while ( match($0,/: \)|\) :/) ) {
$0 = substr($0,1,RSTART-1) map[substr($0,RSTART,RLENGTH)] substr($0,RSTART+RLENGTH)
}
print
}
$ awk -f script2.awk file
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>
Although both approaches produce the same output in this case, the first approach actually works from the end of the string to the front courtesy of the leading .*
while the second approach works front to back. You can see that with this test:
$ echo ': ) :' | awk -f script1.awk
: <sad>
$ echo ': ) :' | awk -f script2.awk
<happy> :
You can do a back-to-front pass with any awk with a tweak but I don't think that's what you really want anyway.
Edit to build the regexp from the map:
$ cat tst.awk
BEGIN {
map[": )"] = "<happy>"
map[") :"] = "<sad>"
for (emoji in map) {
gsub(/[^^]/,"[&]",emoji)
gsub(/\^/,"\\^",emoji)
emojis = (emojis == "" ? "" : emojis "|") emoji
}
}
{
while ( match($0,emojis) ) {
$0 = substr($0,1,RSTART-1) map[substr($0,RSTART,RLENGTH)] substr($0,RSTART+RLENGTH)
}
print
}
$ awk -f tst.awk file
I'm happy <happy> <happy> <happy>
I'm sad <sad> <sad> <sad>
Upvotes: 5