helen
helen

Reputation: 587

How to skip repeated entries in a .csv file

I'm new to bash scripting. I have a text file containing a list of subdomains (URLs) and I'm creating a .csv file (subdomainIP.csv) that has 2 columns: the 1st column contains subdomains (Subdomain) and the 2nd one contains IP addresses (IP). The columns are separated by ",". My code intends to read each line of URLs.txt, finds its IP address and enter the selected subdomain and its IP address in the .csv file.

Whenever I find the IP address of a domain and I want to add it as a new entry to .csv file, I want to check the previous entries of the 2nd column. If there is a similar IP address, I don't want to add the new entry, but if there isn't any similar case, I want to add the new entry. I have done this by adding these lines to my code:

awk '{ if ($IP ~ $ipValue) print "No add"
            else echo "${line}, ${ipValue}" >> subdomainIP.csv}'  subdomainIP.csv

but I receive this error:

awk: cmd. line:2:       else echo "${line}, ${ipValue}" >> subdomainIP.csv}
awk: cmd. line:2:                                       ^ syntax error

What's wrong?

Upvotes: 3

Views: 135

Answers (2)

tshiono
tshiono

Reputation: 22032

Would you please try the following:

declare -A seen                         # memorize the appearance of IPs
echo "Subdomain,IP" > subdomainIP.csv   # let's overwrite, not appending
while IFS= read -r line; do
    ipValue=                            # initialize the value
    while IFS= read -r ip; do
        if [[ $ip =~ ^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
            ipValue+="${ip}-"           # append the results with "-"
        fi
    done < <(dig +short "$line")        # assuming the result has multi-line
    ipValue=${ipValue%-}                # remove trailing "-" if any
    if [[ -n $ipValue ]] && (( seen[$ipValue]++ == 0 )); then
                # if the IP is not empty and not in the previous list
        echo "$line,$ipValue" >> subdomainIP.csv
    fi
done < URLs.txt
  • The associative array seen may be a key for the purpose. It is indexed by an arbitrary string (ip adddress in the case) and can memorize the value associated with the string. It will be suitable to check the appearance of the ip address across the input lines.

Upvotes: 1

Enlico
Enlico

Reputation: 28450

There are some issues in your code. Here's a few of them.

  • If the awk script is in single quotes, as in awk 'script' file, any variables $var in script will not expand. If you want to perform variable expansion, use double quotes. Compare echo hello | awk "{ print \"$PATH\" }" vs echo hello | awk '{ print "$PATH" }'.
  • However, if you do so, than the shell will try to expand $0, $1, $NF, ... and this is certainly not what you want. Therefore you can concatenate single- and double-quoted strings as needed, e.g. echo hello | awk '{ print "$0:"$0 >> "log"; print "$PATH:'"$PATH"'" >> "log" }'
  • Based on what I see from O'Reilly's sed & awk, when you redirect to file from within an awk script, you have to quote the file name, as I've done in the command above for the file named log.

Upvotes: 0

Related Questions