Reputation: 401

Output the first duplicate in a csv file

How do i output the first duplicate of a csv file? for example if i have:

00:0D:67:24:D7:25,1,-34,123,135  
00:0D:67:24:D7:25,1,-84,567,654  
00:0D:67:24:D7:26,1,-83,456,234  
00:0D:67:24:D7:26,1,-86,123,124  
00:0D:67:24:D7:2C,1,-56,245,134  
00:0D:67:24:D7:2C,1,-83,442,123  
00:18:E7:EB:BC:A9,5,-70,123,136  
00:18:E7:EB:BC:A9,5,-90,986,545  
00:22:A4:25:A8:F9,6,-81,124,234  
00:22:A4:25:A8:F9,6,-90,456,654  
64:0F:28:D9:6E:F9,1,-67,789,766  
64:0F:28:D9:6E:F9,1,-85,765,123  
74:9D:DC:CB:73:89,10,-70,253,777

i want my output to look like this:

00:0D:67:24:D7:25,1,-34,123,135  
00:0D:67:24:D7:26,1,-83,456,234  
00:0D:67:24:D7:2C,1,-56,245,134  
00:18:E7:EB:BC:A9,5,-70,123,136  
00:22:A4:25:A8:F9,6,-81,124,234  
64:0F:28:D9:6E:F9,1,-67,789,766  
74:9D:DC:CB:73:89,10,-70,253,777

i was thinking along the lines of first outputting the first line of the csv file so like awk (code that outputs first row) >> file.csv then compare the first field of the row to the first field of the next row, if they are the same, check the next row. Until it comes to a new row, the code will output the new different row so again awk (code that outputs) >> file.csv and it will repeat until the check is complete

im kinda of new to bash coding, but i love it so far, im currently phrasing a csv file and i need some help. Thanks everyone

Upvotes: 2

Answers (4)

svante

Reputation: 1385

Using uniq:

sort lines.csv | uniq -w 17

Provided your first column is fixed size (17). lines.csv is a file with your original input.

Upvotes: 1

Vijay

Reputation: 67301

perl -F, -lane '$x{$F[0]}++;print if($x{$F[0]}==1)' your_file

if you want to change the file inplace:

perl -i -F, -lane '$x{$F[0]}++;print if($x{$F[0]}==1)' your_file

Upvotes: 0

Guru

Reputation: 17054

Using awk:

awk -F, '!a[$1]++' file.csv

awk forms an array where the 1st column is the key and the value is the count of no. of times the particular key is present. '!a[$1]++' will be true only when the 1st occurence of the 1st column, and hence the first occurrence of the line gets printed.

Upvotes: 5

cwgem

Reputation: 2809

If I understand what you're getting at you want something like this:

prev_field=""
while read line
do
  current_field=$(echo $line | cut -d ',' -f 1)
  [[ $current_field != $prev_field ]] && echo $line
  prev_field=$current_field
done < "stuff.csv"

Where stuff.csv is the name of your file. That's assuming that what you're trying to do is take the first field in the csv row and only print the first unique occurrence of it, which if that's the case I think your output may be missing a few.

Upvotes: 1

Output the first duplicate in a csv file

Answers (4)

Related Questions