Reputation: 23
I have a text file with interchanging lines starting with 'WordNode' and 'gloss word' but sometimes there are duplicate lines starting with 'gloss word':
WordNode"a'inai"
gloss word "repose"
WordNode "akti"
gloss word "running"
gloss word "turned on"
gloss word "active"
WordNode "aitco"
gloss word "Armenian"
WordNode "aitxero"
gloss word "ethereal"
gloss word "ether"
I'd like to be able to add the previous wordNode... line to each duplicate lines starting with 'gloss word':
WordNode "a'inai"
gloss word "repose"
WordNode "akti"
gloss word "running"
WordNode "akti"
gloss word "turned on"
WordNode "akti"
gloss word "active"
WordNode "aitco"
gloss word "Armenian"
WordNode "aitxero"
gloss word "ethereal"
WordNode "aitxero"
gloss word "ether"
I tried this
sed -r ':a; N; /(gloss word)[^\n]*\n\1/ s/\n.*//; ta; P; D' file1.txt > file2.txt
but it just keeps the first and delete the following duplicate lines. what would be the correct way to do this using sed awk or any other regular expression?
Upvotes: 0
Views: 126
Reputation: 203169
$ awk '/WordNode/{h=$0 ORS;next} {print h $0}' file
WordNode"a'inai"
gloss word "repose"
WordNode "akti"
gloss word "running"
WordNode "akti"
gloss word "turned on"
WordNode "akti"
gloss word "active"
WordNode "aitco"
gloss word "Armenian"
WordNode "aitxero"
gloss word "ethereal"
WordNode "aitxero"
gloss word "ether"
Upvotes: 0
Reputation: 58351
This might work for you (GNU sed):
sed '/WordNode/h;//d;x;p;x' file
Store the line containing WordNode
in the hold space (HS) and then delete it. For all other lines i.e. line containing gloss word
, swap to the HS, print the HS and then revert to the pattern space (PS) and print that.
Upvotes: 1
Reputation: 67467
awk
to the rescue!
$ awk '/^WordNode/{header=$0; p=0} p{print header} /^gloss word/{p=1} 1' file
WordNode"a'inai"
gloss word "repose"
WordNode "akti"
gloss word "running"
WordNode "akti"
gloss word "turned on"
WordNode "akti"
gloss word "active"
WordNode "aitco"
gloss word "Armenian"
WordNode "aitxero"
gloss word "ethereal"
WordNode "aitxero"
gloss word "ether"
Upvotes: 1
Reputation: 1825
This is most easily done by a script rather than sed or awk as so:
while IFS= read -r line; do
if [[ $line == WordNode* ]]; then wnl=$line; else echo $wnl; echo $line; fi
done << file1.txt
(this only echos the last WordNode
line before the gloss word
line, so if you expect to have multiple WordNode
lines together, and want to echo them all, then you'd have to tweak it to be stateful)
Upvotes: 0