Ali emaditaj
Ali emaditaj

Reputation: 25

How to delete lines with a duplicate numbers

I want to delete any lines that have the same number at end, for example:

Input:

abc 77777
rgtds 77777
aswa 77777
gdf 845
sdf 845
ytn 963
fgnb 963

Output:

abc 77777
gdf 845
ytn 963

Note: every line with a same number most deleted and one of all the lines that had the same number must stay.

I want to convert this text file to my output:

Input:

 c:/files/company/aj/psohz.mp4 905
 c:/files/company/rs/oxija.mp4 905
 c:/files/company/nw/kzlkg.mp4 905
 c:/files/company/wn/wpqov.mp4 905
 c:/files/company/qi/jzdjg.mp4 905
 c:/files/company/kq/dadfr..mp4 905
 c:/files/company/kp/xmpye.jpg 7839
 c:/files/company/fx/jszmn.jpg 7839
 c:/files/company/me/plsqx.mp4 7839
 c:/files/company/xm/uswjb.mp4 7839
 c:/files/company/ay/pnnhu.pdf 8636184
 c:/files/company/os/glwou.pdf 8636184
 c:/files/company/px/kucdu.pdf 8636184

Output:

 c:/files/company/kq/dadfr..mp4 905
 c:/files/company/kp/xmpye.jpg 7839
 c:/files/company/ay/pnnhu.pdf 8636184

Upvotes: 0

Views: 38

Answers (2)

choroba
choroba

Reputation: 241918

If the same numbers are always grouped together, you can use uniq (tested with the version from GNU coreutils):

uniq -f1 input.txt

-f1 means skip the first field when checking duplicities.

Note that it returns the first element of each group, i.e. psohz instead of dadfr in your example. It's not clear what element of each group you wanted, as you returned the last one from the first group, but the first element of the other groups.

If the same numbers aren't grouped together, use sort to group them together:

sort -k2 -su input.txt
  • -s means stable, i.e. you'll always get the first element of each group, but the groups won't be sorted in the orginal order in the output
  • -u means unique
  • -k2 means use only field 2 in comparisons

If you want the first element of each group with the elements sorted the same as in the input, you can use perl.

perl -ane 'print unless $seen{ $F[1] }++' -- input.txt
  • -n reads the input line by line
  • -a splits the input on whitespace into the @F array
  • every second column is saved as a key in the %seen hash. If you see a number for the first time, the line will be printed, but any following occurrence won't, as $seen{ $F[1] } will be greater than 0, i.e. true.

Upvotes: 3

Benjamin W.
Benjamin W.

Reputation: 52162

If you know that there are always just two columns (i.e., no blanks in the filename) and that the lines with the same number are always in the same block, you can use uniq:

$ uniq -f1 infile
 c:/files/company/aj/psohz.mp4 905
 c:/files/company/kp/xmpye.jpg 7839
 c:/files/company/ay/pnnhu.pdf 8636184

-f1 says to ignore the first field when asserting uniqueness.

If you don't know about blanks, and the same numbers might be anywhere in the file, you can use awk:

$ awk '!a[$NF]++' infile
 c:/files/company/aj/psohz.mp4 905
 c:/files/company/kp/xmpye.jpg 7839
 c:/files/company/ay/pnnhu.pdf 8636184

This counts the number of occurrences of the last field of each line, and if that number is zero before incrementing, the line gets printed. It's a compact way of expressing

awk '{ if (a[$NF] == 0) { print; a[$NF] += 1 } }' infile

Upvotes: 1

Related Questions