Reputation: 59
I do have the generated CSV file which contains the duplicate values .I would like to delete/remove that duplicate values using AWK or Sed commands.
Actual output
10.135.83.48,9042
10.135.83.46,9042
10.135.83.44,9042
10.5.197.25,10334
10.39.8.166,1500
10.135.83.48,9042
10.135.83.46,9042
10.135.83.44,9042
https://t-mobile.com,443
https://t-mobile.com,443
http://localhost:5059/abc/token,80
Expected output
10.135.83.48,9042
10.135.83.46,9042
10.135.83.44,9042
10.5.197.25,10334
10.39.8.166,1500
https://t-mobile.com,443
http://localhost:5059/abc/token,80
From few of property files i got this output. Below is the script which i am trying
#!/bin/bash
for file in $(ls);
do
#echo " --$file -- ";
grep -P '((?<=[^0-9.]|^)[1-9][0-9]{0,2}(\.([0-9]{0,3})){3}(?=[^0-9.]|$)|(http|ftp|https|ftps|sftp)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/+#-]*[\w@?^=%&/+#-])?|\.port|\.host|contact-points|\.uri|\.endpoint)' $file|grep '^[^#]' |awk '{split($0,a,"#"); print a[1]}'|awk '{split($0,a,"="); print a[1],a[2]}'|sed 's/^\|#/,/g'|awk '/http:\/\// {print $2,80}
/https:\/\// {print $2,443}
/Points/ {print $2,"9042"}
/host/ {h=$2}
/port/ {print h,$2; h=""}'|awk -F'[, ]' '{for(i=1;i<NF;i++){print $i,$NF}}'|awk 'BEGIN{OFS=","} {$1=$1} 1'|sed '/^[0-9]*$/d'|awk -F, '$1 != $2'
done |awk '!a[$0]++'
#echo "Done."
stty echo
cd ..
awk '!a[$0]++' --> This is the command i am trying to combine with above script. Individually this command is working.But when i am trying to combine with the above script it is not working as expected.
Thanks for your help in advance.
Upvotes: 1
Views: 927
Reputation: 58371
This might work for you (GNU sed):
sed -E 'H;x;s/((\n[^\n]+)(\n.*)*)\2$/\1/;x;$!d;x;s/.//' file1
Append the current line to the hold space (HS) and if it is a duplicate, remove it.
At the end of the file, swap to the HS, remove the first character (which is a newline artifact) and print the result.
N.B. This removes duplicates but retains original order.
Upvotes: 1
Reputation: 84531
The simplest way to approach this (or one of the simplest) is to keep an array indexed by the records that have been seen. If the records isn't in the seen
array, add it and print the record. If it is, just skip the record, e.g.
awk '$0 in seen{next}; {seen[$0]++}1' file
Example Use/Output
With your input in the file named dupes
, you would have:
$ awk '$0 in seen{next}; {seen[$0]++}1' dupes
10.135.83.48,9042
10.135.83.46,9042
10.135.83.44,9042
10.5.197.25,10334
10.39.8.166,1500
https://t-mobile.com,443
http://localhost:5059/abc/token,80
Upvotes: 1
Reputation: 2705
Try
#!/bin/bash
for file in *;
do
#echo " --$file -- ";
grep -P '((?<=[^0-9.]|^)[1-9][0-9]{0,2}(\.([0-9]{0,3})){3}(?=[^0-9.]|$)|(http|ftp|https|ftps|sftp)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/+#-]*[\w@?^=%&/+#-])?|\.port|\.host|contact-points|\.uri|\.endpoint)' $file|grep '^[^#]' |awk '{split($0,a,"#"); print a[1]}'|awk '{split($0,a,"="); print a[1],a[2]}'|sed 's/^\|#/,/g'|awk '/http:\/\// {print $2,80}
/https:\/\// {print $2,443}
/Points/ {print $2,"9042"}
/host/ {h=$2}
/port/ {print h,$2; h=""}'|awk -F'[, ]' '{for(i=1;i<NF;i++){print $i,$NF}}'|awk 'BEGIN{OFS=","} {$1=$1} 1'|sed '/^[0-9]*$/d'|awk -F, '$1 != $2' | awk '!a[$0]++'
done
#echo "Done."
stty echo
cd ..
Upvotes: 1