Reputation: 11

BASH - Find duplicates in multiple files

I have multiple files in the same directory, each file represents a user and contains IP's used to log into this account, each in a new line.

I want to create a script that will check if the same IP occurs in multiple files and of course print duplicates.

I've tried using awk but with no luck, any help appreciated!

Upvotes: 0

Answers (4)

EvansWinner

Reputation: 158

How about something like:

diff -u <(cat * | sort) <(cat * | sort | uniq)

In other words, the difference between all the files concatenated and sorted, and all the files concatenated, sorted, and then the duplicates removed.

Upvotes: 0

Jamil Said

Reputation: 2103

Assuming that there are no repeated IP addresses on the same file, this should work for IPv4 addresses in many Bash versions:

#!/bin/bash
#For IP addresses v4, assuming no repeated IP addresses on the same file; result is stored on the file /tmp/repeated-ips
mkdir -p /tmp
grep -rhEo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /home/user/folder > /tmp/ipaddresses-holder
sort /tmp/ipaddresses-holder | uniq -d > /tmp/repeated-ips
Exit 0

The script below is a little more complex, but it would work whether or not there are repeated IP addresses on a single file:

#!/bin/bash
#For IP addresses v4, result is stored on the file /tmp/repeated-ips
mkdir -p /tmp
grep -rEo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /home/user/folder > /tmp/ipaddresses-holder
sort -u /tmp/ipaddresses-holder  > /tmp/ipaddresses-holder2
grep -rhEo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/ipaddresses-holder2 > /tmp/ipaddresses-holder3
sort /tmp/ipaddresses-holder3 | uniq -d > /tmp/repeated-ips
Exit 0

In both cases, the result is stored on the file /tmp/repeated-ips

Upvotes: 1

chw21

Reputation: 8140

Not sure I understand your question correctly, so here's what I think you want to do:

You have several files. Each file refers to a specific user and logs every IP address that that user has used to log in from. Example:

$ cat alice.txt
192.168.1.1
192.168.1.5
192.168.1.1
192.168.1.1
$ cat bob.txt
192.168.0.1
192.168.1.3
192.168.1.2
192.168.1.3
$ cat eve.txt
192.168.1.7
192.168.1.5
192.168.1.7
192.168.0.7

You want to find out whether the same IP address appears in multiple files.

Here's what I came up with.

#!/usr/bin/env bash
SEARCH_TERMS="search_terms.txt"
for source_file in $@
do
    for search_term in $(sort -u $source_file)
    do
        found=$(grep -F "${search_term}" $@ --exclude=${source_file})
        if [[ -n "${found}" ]]; then
            echo "Found ${search_term} from ${source_file} also here:"
            echo ${found}
        fi
    done
done

It's probably not the best solution.

Upvotes: 0

Jay Rajput

Reputation: 1898

Use the following awk command:

awk '$0 in a {print FILENAME, "IP:", $0, "also in:", a[$0]; next} {a[$0] = FILENAME}' /tmp/user*

Assuming that you have file just with the IP like this

[tmp]$cat /tmp/user1
1.1.1.1
[tmp]$cat /tmp/user2
2.2.2.2
[tmp]$cat /tmp/user3
1.1.1.1

Output

[tmp]$awk '$0 in a {print FILENAME, "IP:", $0, "also in:", a[$0]; next} {a[$0] = FILENAME}' /tmp/user*
/tmp/user3 IP: 1.1.1.1 also in: /tmp/user1

Explanation

awk '
  $0 in a {                        # if IP already exists in array a
    print FILENAME, "IP:", $0, \   # print the output
       "also in:", a[$0];
    next;                          # get the next record without further
  }                                # processing
  {a[$0] = FILENAME}               # if reached here, then we are seeing IP
'                                  # for the first time, so store it

Upvotes: 1

BASH - Find duplicates in multiple files

Answers (4)

Related Questions