Reputation: 35
I have a file with multiple columns like
abc cvn bla..bla..n_columns
xnt yuk m_columns
abc cvn xxxx
vbh ast
sth rty
xnt yuk
I want to create a new file by comparing the repeated word pairs in first two columns. The final file will look like
abc cvn bla..bla..n_columns
xnt yuk m_columns
vbh ast
sth rty
Upvotes: 0
Views: 1096
Reputation: 10039
sed -n 'H
$ {x
s/$/\
/
: again
s/\(\n\)\([^ ]\{1,\} \{1,\}[^ [:cntrl:]]\{1,\}\)\(.*\)\1\2[^[:cntrl:]]*\n/\1\2\3\1/
t again
s/\n\(.*\)\n/\1/
p
}' YourFile
based on any repeated peer of value (pair is character not space or \n separate by "space") in whole text with a loop while there is a peer finnded and replaced.
principle
H
Append each line (sed work line by line in work buffer) from working buffer into the hold buffer (there is a working buffer and a hold buffer)$
at the end x
swap working and hold buffer, so all the file is in working buffer but starting with a new line (due to Append action)s/...
Add a New line at the end (for later substitution process delimiter): again
put a label anchor (for a later goto)s/...//
is the core of the process. Search a starting (after a new line) peer of word and a later same starting peer, if find, substitute the whole block with the part from start of block until second peer not included. (block start at first peer until new line on same line as second peer)t again
if substitution earlier is made, go to label again
s/.../
remove the added new line at start and endp
print the resultSed is trying always to take the mose of a pattern so if there is more than 2 peer of 1 of the uniq peer, it first remove the last peer and go back until there is only 1
Upvotes: 0
Reputation: 4267
If abc cvn xxxx appears before abc cvn bla..bla..n_columns I just want to keep any of the line. It does not matter for me which line should be there. Any of the line will be okay.
If the output sequence doesn't matter, you can use sort
sort -u -k1,2 file
otherwise you should use awk
as suggested by devnull
Upvotes: 0