How to skip repeated entries in a .csv file

Question

I'm new to bash scripting. I have a text file containing a list of subdomains (URLs) and I'm creating a .csv file (subdomainIP.csv) that has 2 columns: the 1st column contains subdomains (Subdomain) and the 2nd one contains IP addresses (IP). The columns are separated by ",". My code intends to read each line of URLs.txt, finds its IP address and enter the selected subdomain and its IP address in the .csv file.

Whenever I find the IP address of a domain and I want to add it as a new entry to .csv file, I want to check the previous entries of the 2nd column. If there is a similar IP address, I don't want to add the new entry, but if there isn't any similar case, I want to add the new entry. I have done this by adding these lines to my code:

awk '{ if ($IP ~ $ipValue) print "No add"
            else echo "${line}, ${ipValue}" >> subdomainIP.csv}'  subdomainIP.csv

but I receive this error:

awk: cmd. line:2:       else echo "${line}, ${ipValue}" >> subdomainIP.csv}
awk: cmd. line:2:                                       ^ syntax error

What's wrong?

tshiono · Accepted Answer

Would you please try the following:

declare -A seen                         # memorize the appearance of IPs
echo "Subdomain,IP" > subdomainIP.csv   # let's overwrite, not appending
while IFS= read -r line; do
    ipValue=                            # initialize the value
    while IFS= read -r ip; do
        if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
            ipValue+="${ip}-"           # append the results with "-"
        fi
    done < <(dig +short "$line")        # assuming the result has multi-line
    ipValue=${ipValue%-}                # remove trailing "-" if any
    if [[ -n $ipValue ]] && (( seen[$ipValue]++ == 0 )); then
                # if the IP is not empty and not in the previous list
        echo "$line,$ipValue" >> subdomainIP.csv
    fi
done < URLs.txt

The associative array seen may be a key for the purpose. It is indexed by an arbitrary string (ip adddress in the case) and can memorize the value associated with the string. It will be suitable to check the appearance of the ip address across the input lines.

How to skip repeated entries in a .csv file

Answers (2)

Related Questions