Reputation: 25
I want to delete any lines that have the same number at end, for example:
Input:
abc 77777
rgtds 77777
aswa 77777
gdf 845
sdf 845
ytn 963
fgnb 963
Output:
abc 77777
gdf 845
ytn 963
Note: every line with a same number most deleted and one of all the lines that had the same number must stay.
I want to convert this text file to my output:
Input:
c:/files/company/aj/psohz.mp4 905
c:/files/company/rs/oxija.mp4 905
c:/files/company/nw/kzlkg.mp4 905
c:/files/company/wn/wpqov.mp4 905
c:/files/company/qi/jzdjg.mp4 905
c:/files/company/kq/dadfr..mp4 905
c:/files/company/kp/xmpye.jpg 7839
c:/files/company/fx/jszmn.jpg 7839
c:/files/company/me/plsqx.mp4 7839
c:/files/company/xm/uswjb.mp4 7839
c:/files/company/ay/pnnhu.pdf 8636184
c:/files/company/os/glwou.pdf 8636184
c:/files/company/px/kucdu.pdf 8636184
Output:
c:/files/company/kq/dadfr..mp4 905
c:/files/company/kp/xmpye.jpg 7839
c:/files/company/ay/pnnhu.pdf 8636184
Upvotes: 0
Views: 38
Reputation: 241918
If the same numbers are always grouped together, you can use uniq
(tested with the version from GNU coreutils):
uniq -f1 input.txt
-f1
means skip the first field when checking duplicities.
Note that it returns the first element of each group, i.e. psohz
instead of dadfr
in your example. It's not clear what element of each group you wanted, as you returned the last one from the first group, but the first element of the other groups.
If the same numbers aren't grouped together, use sort
to group them together:
sort -k2 -su input.txt
-s
means stable, i.e. you'll always get the first element of each group, but the groups won't be sorted in the orginal order in the output-u
means unique-k2
means use only field 2 in comparisonsIf you want the first element of each group with the elements sorted the same as in the input, you can use perl
.
perl -ane 'print unless $seen{ $F[1] }++' -- input.txt
-n
reads the input line by line-a
splits the input on whitespace into the @F
array%seen
hash. If you see a number for the first time, the line will be printed, but any following occurrence won't, as $seen{ $F[1] }
will be greater than 0, i.e. true.Upvotes: 3
Reputation: 52162
If you know that there are always just two columns (i.e., no blanks in the filename) and that the lines with the same number are always in the same block, you can use uniq
:
$ uniq -f1 infile
c:/files/company/aj/psohz.mp4 905
c:/files/company/kp/xmpye.jpg 7839
c:/files/company/ay/pnnhu.pdf 8636184
-f1
says to ignore the first field when asserting uniqueness.
If you don't know about blanks, and the same numbers might be anywhere in the file, you can use awk:
$ awk '!a[$NF]++' infile
c:/files/company/aj/psohz.mp4 905
c:/files/company/kp/xmpye.jpg 7839
c:/files/company/ay/pnnhu.pdf 8636184
This counts the number of occurrences of the last field of each line, and if that number is zero before incrementing, the line gets printed. It's a compact way of expressing
awk '{ if (a[$NF] == 0) { print; a[$NF] += 1 } }' infile
Upvotes: 1