BitTad
BitTad

Reputation: 23

How to add a string in between consecutive lines starting with the same word?

I have a text file with interchanging lines starting with 'WordNode' and 'gloss word' but sometimes there are duplicate lines starting with 'gloss word':

WordNode"a'inai"
gloss word "repose"
WordNode "akti"
gloss word "running"
gloss word "turned on"
gloss word "active"
WordNode "aitco"
gloss word "Armenian"
WordNode "aitxero"
gloss word "ethereal"
gloss word "ether"

I'd like to be able to add the previous wordNode... line to each duplicate lines starting with 'gloss word':

WordNode "a'inai"
gloss word "repose"
WordNode "akti"
gloss word "running"
WordNode "akti"
gloss word "turned on"
WordNode "akti"
gloss word "active"
WordNode "aitco"
gloss word "Armenian"
WordNode "aitxero"
gloss word "ethereal"
WordNode "aitxero"
gloss word "ether"

I tried this

sed -r ':a; N; /(gloss word)[^\n]*\n\1/ s/\n.*//; ta; P; D' file1.txt > file2.txt

but it just keeps the first and delete the following duplicate lines. what would be the correct way to do this using sed awk or any other regular expression?

Upvotes: 0

Views: 126

Answers (4)

Ed Morton
Ed Morton

Reputation: 203169

$ awk '/WordNode/{h=$0 ORS;next} {print h $0}' file
WordNode"a'inai"
gloss word "repose"
WordNode "akti"
gloss word "running"
WordNode "akti"
gloss word "turned on"
WordNode "akti"
gloss word "active"
WordNode "aitco"
gloss word "Armenian"
WordNode "aitxero"
gloss word "ethereal"
WordNode "aitxero"
gloss word "ether"

Upvotes: 0

potong
potong

Reputation: 58351

This might work for you (GNU sed):

sed '/WordNode/h;//d;x;p;x' file

Store the line containing WordNode in the hold space (HS) and then delete it. For all other lines i.e. line containing gloss word, swap to the HS, print the HS and then revert to the pattern space (PS) and print that.

Upvotes: 1

karakfa
karakfa

Reputation: 67467

awk to the rescue!

$ awk '/^WordNode/{header=$0; p=0} p{print header} /^gloss word/{p=1} 1' file

WordNode"a'inai"
gloss word "repose"
WordNode "akti"
gloss word "running"
WordNode "akti"
gloss word "turned on"
WordNode "akti"
gloss word "active"
WordNode "aitco"
gloss word "Armenian"
WordNode "aitxero"
gloss word "ethereal"
WordNode "aitxero"
gloss word "ether"

Upvotes: 1

blackghost
blackghost

Reputation: 1825

This is most easily done by a script rather than sed or awk as so:

while IFS= read -r line; do
    if [[ $line == WordNode* ]]; then wnl=$line; else echo $wnl; echo $line; fi
done << file1.txt

(this only echos the last WordNode line before the gloss word line, so if you expect to have multiple WordNode lines together, and want to echo them all, then you'd have to tweak it to be stateful)

Upvotes: 0

Related Questions