Reputation: 305
I realize that title is horrible but anyways, I have an assignment to change all instances of "cat" to "dog" using sed. Simple enough but it also includes words like "catapult" and "bearcat" which I tried to avoid by putting a space in the code. My problem is that every word becomes "dog" where certain instances I want it to be "Dog" or "dogs"...
Here's the text file I'm changing:
Dear Homeowner,
Cats are important to people. We all enjoy the company of cats. If you have ever wanted to own a cat we can help. We are attempting to hold a “cat comes home” day for our city. To help us we've enlisted the NWMSU Bearcat cheerleaders, the organizers of the city's annual catapult toss, and local celebrities to help find homes for cats.
There is a cat that needs you to provide a home for them. So if you are a cat lover, please come and see if there isn't some way to find a home in your heart for a cat.
Thanks!!
Cats are people too.
And here is the output I'm getting which is obviously wrong
Dear Homeowner,
dog are important to people. We all enjoy the company of cats. If you have ever wanted to own a cat we can help. We are attempting to hold a “cat comes home” day for our city. To help us we've enlisted the NWMSU Bearcat cheerleaders, the organizers of the city's annual catapult toss, and local celebrities to help find homes for cats.
There is a cat that needs you to provide a home for them. So if you are a cat lover, please come and see if there isn't some way to find a home in your heart for a cat.
Thanks!!
dog are people too.
And this is my code:
sed 's/[Cc]at[s] /dog /g' cats-dogs.txt
Upvotes: 3
Views: 208
Reputation: 246877
Using perl, but it ain't pretty:
perl -pe 's/\b(c)at(?=s?\b)/ $1 =~ m{[[:upper:]]} ? "Dog" : "dog" /ige' <<END
scat cat cats Cats Cat Catskills
END
outputs
scat dog dogs Dogs Dog Catskills
Upvotes: 0
Reputation: 4185
I'm pretty sure you can't do this in (a single) RegEx alone.
That said, the simple solution might be the best here, as there are only two possible cases (upper and lower) and one replacement word, it seems (Also sed
allows multiple replacement easily).
So something like this should work (assuming GNU sed):
sed -r 's/\bCat(s?)\b/Dog\1/g; s/\bcat(s?)\b/dog\1/g' cats-dogs.txt
Using extended regexp as it's far less horrible to quote on the command line. Note the scanning for word boundaries here too.
There's probably a very clever (and unreadable) sed
way of doing this using \u
and buffers too..
Upvotes: 3
Reputation: 46856
Let's parse your attempt so far.
s/[Cc]at[s] /dog /g
This searches for the regex [Cc]at[s]
and substitutes dog
. There are a few reasons it doesn't work...
[s]
just means "the letter s
".If you're using Linux, then the version of sed installed on your system is probably GNU sed, with which the following might work:
sed -r 's/\bcat(s?)\b/dog\1/g;s/\bCat(s?)\b/Dog\1/g'
Note the -r
option, which tells sed to use "Extended" regular expression notation rather than its default "Basic" notation.
This solution relies on sed's understanding of the \b
word boundary, but it's important to note that this shorthand is NOT universally available in the sed implementations on other operating systems (FreeBSD, OSX, Solaris, etc). If portability is important, avoid using \b
and similar things.
This shorthand is nice, but really isn't required. Here's the same thing in BRE:
sed 's/[[:<:]]cat\(s*\)[[:>:]]/dog\1/g;s/[[:<:]]Cat\(s*\)[[:>:]]/Dog\1/g'
This is BRE instead of ERE, so we don't use the -r
option. I should point out that this will also match "catssss" because we're using s*
instead of s?
. The BRE in many sed implementations doesn't include a way to identify just one occurrence of an atom.
The traditional classes [[:<:]]
and [[:>:]]
apply to the beginning or end of a word, which may sometimes be preferred over GNU sed's "word boundary" which can be used for beginning or end of words.
The non-GNU RE format can be seen on any unix with man re_format
.
(NOTE: sed's -r
option is also not universal. In OSX, use -E
instead. This is because OSX's sed is derived from an older version of FreeBSD, which only added -r
as an equivalent option to -E
a few versions ago.)
Upvotes: 3