JoCM
JoCM

Reputation: 23

Search pattern from tag and inserting it to next line by using sed or awk

I have several text files that I need to modify. They look like as:

Tag: Brown
Chair
Pencil
Tag: Red
Apple
Shirt
Pant
         # <--- some files have one or more (about less than five) blank line(s)
Tag: Black
Wall

I would like to format it by taking words after "Tag:" as variable, inserting to next line until it meet other "Tag:". The lines between "Tag:" may varies. So here output format example:

Brown Chair and Chairs
Brown Pencil and Pencils
Red Apple and Apples
Red Shirt and Shirts
Red Pant and Pants
         # <--- blank line(s) retain BLANK(s)
Black Wall and Walls

As I take a look and modify some sample at http://sed.sourceforge.net/ but still no success.

sed ':loop; $!N; /^Tag:/h; n; /^Tag:/!b next; t loop; :next; x; p; x'

Thank you.

**Update**

As @jaypal suggestion and looking "carefully" at each text file, I'm adding "blank line(s)" scenario.

Upvotes: 2

Views: 78

Answers (3)

Ed Morton
Ed Morton

Reputation: 204446

Given an input file as posted in the question and with 2 blank lines:

$ awk '/^Tag:/{tag=$2; next} {print (NF ? tag " " $0 " and " $0 "s" : $0)}' file
Brown Chair and Chairs
Brown Pencil and Pencils
Red Apple and Apples
Red Shirt and Shirts
Red Pant and Pants


Black Wall and Walls

Upvotes: 0

Beta
Beta

Reputation: 99154

My attempt with sed (without loops, branches or backreferences, I like things simple):

sed '/Tag:/{s/Tag: //;h;d;};G;s/\(.*\)\n\(.*\)/\2 \1 and \1s/'

EDIT:

To preserve blank lines:

sed '/Tag:/{s/Tag: //;h;d;};/./{G;s/\(.*\)\n\(.*\)/\2 \1 and \1s/;}'

Upvotes: 2

Tom Fenech
Tom Fenech

Reputation: 74685

The following code deals with the most trivial of pluralisations (as in your example):

awk '/^Tag:/ {c=$2; next} {print c, $1, "and", $1"s"}' file

If the pattern matches, save the second field to c and skip to the next line. Otherwise, print the first word on the line with the simple pluralisation.

For something a bit more upmarket that is capable of pluralising a wider range of words, you could use the Lingua::EN::Inflect Perl module:

perl -MLingua::EN::Inflect=PL -lane 'if(@F==2){$c=$F[1]}else{print "@{[$c,$_,q/and/,PL $_]}"}' file

Use -a to enable auto-split mode. If there are two fields, save the second one to $c (you could also do this using regex, I just fancied some variety). Otherwise, print the list. Using the @{[ ]} and wrapping in double quotes uses the built-in variable $" to join the list, which is a space by default.

Testing it out:

$ cat file
Tag: Brown
Chair
Pencil
Tag: Red
Apple
Shirt
Pant
Tag: White
Mouse
$ perl -MLingua::EN::Inflect=PL -lane 'if(@F==2){$c=$F[1]}else{print "@{[$c,$_,q/and/,PL $_]}"}' file
Brown Chair and Chairs
Brown Pencil and Pencils
Red Apple and Apples
Red Shirt and Shirts
Red Pant and Pants
White Mouse and Mice

Upvotes: 2

Related Questions