Bharath_Raja
Bharath_Raja

Reputation: 642

How to replace every second occurence of a word in a text file

In a file called sample.txt, I have the following text:

Once there is a tortoise and a rabbit. The rabbit was fast, tortoise was slow. Rabbit used to mock the tortoise. Once the rabbit challenged the tortoise for a race. Tortoise accepted rabbit’s request. Rabbit was overconfident. Rabbit thought to win the race. Rabbit ran fast. Then rabbit got tired. Rabbit wanted to take rest. So rabbit slept under the tree. Tortoise kept going and won the race.

How to replace every second occurrence of rabbit to hare using Unix commands?

Upvotes: 2

Views: 381

Answers (2)

Walter A
Walter A

Reputation: 20022

When the input is one line (or you are happy to count from 1 at the beginning of each line), and want to ignore the uppercase Rabbit, you can use this solution: First replace all rabbits by one character that sed can match.
Replace the second rabbit-characters and restore the other rabbits.

sed -r 's/rabbit/\r/g; s/(\r[^\r]*)\r/\1hare/g; s/\r/rabbit/g' sample.txt

Edit, Additional explanation:
When the input file is a clean unix-style file (no MS-DOS endings \r\n), we know that the character \r is unique. After sed -r 's/rabbit/\r/g each rabbit is represented by \r (the letter r actually isn't short the first letter of rabbit but the first of return).
Now you want to look for sequences <rabbit><not-a-rabbit><rabbit>, in our new notation that is the sequence \r[^\r]*\r, where [^\r]* stands for any sequence of characters without the rabbit character.
When we found 2 rabbits, we want to remember the first rabbit with the non-rabbit characters. In sed you can remember a matched sequence with \(..\), or use the option -r and (..). You can recall the first memory location (we only have one here) with \1, in this case the first rabbit \r and the non-rabbit characters. The second rabbit \r is replaced by hare.
After replacing the second \r (global on the line, so every second one), we want to transform the \r rabbits into the string rabbit.

More possibilities
When your inputfile has more than 1 line, you might want something different. With one rabbit on the first and one rabbit on the second line, how can you catch the second rabbit? Before performing the above sed command, you need to transpose your input file to 1 line. Afterwards you want to restore the line-endings, so you will need to replace the line-endings with a special character. Normally I would use the \r for this, but that character is reserved for the rabbits. The character \v is possible to, resulting in

tr '\n' '\v' < sample.txt | 
   sed -r 's/rabbit/\r/g; s/(\r[^\r]*)\r/\1hare/g; s/\r/rabbit/g' | 
   tr '\v' '\n'

When you also want to replace uppercase Rabbits, we can transpose those Rabbits in \a.
You can ask for any rabbit (large or small) with [\r\a], what will make the command one level more complex:

tr '\n' '\v' < sample.txt | 
sed -r 's/rabbit/\r/g; s/Rabbit/\a/g; 
        s/([\r\a][^\r\a]*)[\r\a]/\1hare/g;
        s/\r/rabbit/g; s/\a/Rabbit/g' |
tr '\v' '\n'

When you want to replace the uppercase Rabbit \a with an uppercase Hare, the command will get even more complex (you need another special character).
I want to use the \x01 for marking a [Rr]abbit to be changed.

tr '\n' '\v' < sample.txt | 
   sed -r 's/rabbit/\r/g;
       s/Rabbit/\a/g;
       s/([\r\a][^\r\a]*)([\r\a])/\1\x01\2/g;
       s/\x01\r/hare/g;
       s/\x01\a/Hare/g;
       s/\r/rabbit/g; s/\a/Rabbit/g' |
tr '\v' '\n'

Upvotes: 2

Bharath_Raja
Bharath_Raja

Reputation: 642

$ sed 's/[Rr]abbit/hare/2' sample.txt

Upvotes: -1

Related Questions