Reputation: 23
I have several text files that I need to modify. They look like as:
Tag: Brown
Chair
Pencil
Tag: Red
Apple
Shirt
Pant
# <--- some files have one or more (about less than five) blank line(s)
Tag: Black
Wall
I would like to format it by taking words after "Tag:" as variable, inserting to next line until it meet other "Tag:". The lines between "Tag:" may varies. So here output format example:
Brown Chair and Chairs
Brown Pencil and Pencils
Red Apple and Apples
Red Shirt and Shirts
Red Pant and Pants
# <--- blank line(s) retain BLANK(s)
Black Wall and Walls
As I take a look and modify some sample at http://sed.sourceforge.net/ but still no success.
sed ':loop; $!N; /^Tag:/h; n; /^Tag:/!b next; t loop; :next; x; p; x'
Thank you.
**Update**
As @jaypal suggestion and looking "carefully" at each text file, I'm adding "blank line(s)" scenario.
Upvotes: 2
Views: 78
Reputation: 204446
Given an input file as posted in the question and with 2 blank lines:
$ awk '/^Tag:/{tag=$2; next} {print (NF ? tag " " $0 " and " $0 "s" : $0)}' file
Brown Chair and Chairs
Brown Pencil and Pencils
Red Apple and Apples
Red Shirt and Shirts
Red Pant and Pants
Black Wall and Walls
Upvotes: 0
Reputation: 99154
My attempt with sed (without loops, branches or backreferences, I like things simple):
sed '/Tag:/{s/Tag: //;h;d;};G;s/\(.*\)\n\(.*\)/\2 \1 and \1s/'
EDIT:
To preserve blank lines:
sed '/Tag:/{s/Tag: //;h;d;};/./{G;s/\(.*\)\n\(.*\)/\2 \1 and \1s/;}'
Upvotes: 2
Reputation: 74685
The following code deals with the most trivial of pluralisations (as in your example):
awk '/^Tag:/ {c=$2; next} {print c, $1, "and", $1"s"}' file
If the pattern matches, save the second field to c
and skip to the next line. Otherwise, print the first word on the line with the simple pluralisation.
For something a bit more upmarket that is capable of pluralising a wider range of words, you could use the Lingua::EN::Inflect
Perl module:
perl -MLingua::EN::Inflect=PL -lane 'if(@F==2){$c=$F[1]}else{print "@{[$c,$_,q/and/,PL $_]}"}' file
Use -a
to enable auto-split mode. If there are two fields, save the second one to $c
(you could also do this using regex, I just fancied some variety). Otherwise, print the list. Using the @{[ ]}
and wrapping in double quotes uses the built-in variable $"
to join the list, which is a space by default.
Testing it out:
$ cat file
Tag: Brown
Chair
Pencil
Tag: Red
Apple
Shirt
Pant
Tag: White
Mouse
$ perl -MLingua::EN::Inflect=PL -lane 'if(@F==2){$c=$F[1]}else{print "@{[$c,$_,q/and/,PL $_]}"}' file
Brown Chair and Chairs
Brown Pencil and Pencils
Red Apple and Apples
Red Shirt and Shirts
Red Pant and Pants
White Mouse and Mice
Upvotes: 2