Reputation: 2696
Given a sorted file like so:
AAA 1 2 3
AAA 2 3 4
AAA 3 4 2
BBB 1 1 1
BBB 1 2 1
and a desired output of
AAA 1 2 3
BBB 1 1 1
what's the best way to achieve this with sed?
Basically, if the col starts with the same field as the previous line, how do I delete it? The rest of the data must be kept on the output.
I imagine there must be some way to do this either using the hold buffer, branching, or the test command.
Upvotes: 0
Views: 114
Reputation: 5072
Using sed:
#!/bin/sed -nf
P
: loop
s/\s.*//
N
/\([^\n][^\n]*\)\n\1/ b loop
D
Firstly, we must pass the -n
flag to sed so it will only print what we tell it to.
We start off by printing the line with the "P" command, because the first line will always be printed and we will force sed to only execute this line when we want it to.
Now we will do a loop. We define a loop with a starting label through the ":" command (in this case we name the label as "loop"), and when necessary we jump back to this label with a "b" command (or a "t" test command). This loop is quite simple:
\(
and ends with \)
). In this case we match all characters that aren't a newline character (ie. [^\n]
) up to the end of the line. We do this by matching at least one of non-newline characters followed by an arbitrary sequence of them. This prevents matching an empty string before a newline. After the capture, we match a newline character followed by the result of the capture, by using the special variable \1
, which contains the input matched by that first capture. If this succeeds, we have a line that repeats the first field, so we jump back to the start of the loop with the "b" branch command.This can be shortened into a single line (notice that we have renamed the "loop" label into "a"):
sed -e 'P;:a;s/\s.*//;N;/\([^\n][^\n]*\)\n\1/ba;D'
Upvotes: 0
Reputation: 54402
One way using GNU awk
:
awk '!array[$1]++' file.txt
Results:
AAA 1 2 3
BBB 1 1 1
Upvotes: 0
Reputation: 58430
This might work for you (GNU sed):
sed -r ':a;$!N;s/^((\S+\s).*)\n\2.*/\1/;ta;P;D' file
or maybe just:
sort -uk1,1 file
Upvotes: 0
Reputation:
This could be done with AWK:
$ gawk '{if (last != $1) print; last = $1}' in.txt
AAA 1 2 3
BBB 1 1 1
Upvotes: 1
Reputation: 65791
Maybe there's a simpler way with sed
, but:
sed ':a;N;/\([[:alnum:]]*[[:space:]]\).*\n\1/{s/\n.*//;ta};P;D'
This produces the output
AAA 1 2 3
BBB 1 1 1
which differs from that in the question, but matches the description:
if the col starts with the same field as the previous line, how do I delete it?
Upvotes: 0