CodeMed
CodeMed

Reputation: 9189

counting unique occurrences of string in document

I am reading a logfile into java. For each line in the logfile, I am checking to see if the line contains an ip address. If the line contains an ip address, I want to then +1 to the count of the number of times that ip address showed up in the log file. How can I accomplish this in Java?

The code below successfully extracts the ip address from each line that contains an ip address, but the process for counting occurrences of ip addresses does not work.

void read(String fileName) throws IOException {
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)));
    int counter = 0;
    ArrayList<IPHolder> ips = new ArrayList<IPHolder>();
    try {
        String line;
        while ((line = br.readLine()) != null) {
            if(!getIP(line).equals("0.0.0.0")){
                if(ips.size()==0){
                    IPHolder newIP = new IPHolder();
                    newIP.setIp(getIP(line));
                    newIP.setCount(0);
                    ips.add(newIP);
                }
                for(int j=0;j<ips.size();j++){
                    if(ips.get(j).getIp().equals(getIP(line))){
                        ips.get(j).setCount(ips.get(j).getCount()+1);
                    }else{
                        IPHolder newIP = new IPHolder();
                        newIP.setIp(getIP(line));
                        newIP.setCount(0);
                        ips.add(newIP);
                    }
                }
                if(counter % 1000 == 0){System.out.println(counter+", "+ips.size());}
                counter+=1;
            }
        }
    } finally {br.close();}
    for(int k=0;k<ips.size();k++){
        System.out.println("ip, count: "+ips.get(k).getIp()+" , "+ips.get(k).getCount());
    }
}

public String getIP(String ipString){//extracts an ip from a string if the string contains an ip
    String IPADDRESS_PATTERN = 
    "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)";

    Pattern pattern = Pattern.compile(IPADDRESS_PATTERN);
    Matcher matcher = pattern.matcher(ipString);
    if (matcher.find()) {
        return matcher.group();
    }
    else{
        return "0.0.0.0";
    }
}

The holder class is:

public class IPHolder {

    private String ip;
    private int count;

    public String getIp(){return ip;}
    public void setIp(String i){ip=i;}

    public int getCount(){return count;}
    public void setCount(int ct){count=ct;}
}

Upvotes: 0

Views: 1510

Answers (2)

Phiwa
Phiwa

Reputation: 319

The key word to search for is HashMap in this case. A HashMap is a list of key value pairs (in this case pairs of ips and their count).

"192.168.1.12" - 12
"192.168.1.13" - 17
"192.168.1.14" - 9

and so on. It is much easier to use and access than to always iterate over your array of container objects to find out whether there already is a container for that ip or not.

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(/*Your file */)));

HashMap<String, Integer> occurrences = new HashMap<String, Integer>();

String line = null;

while( (line = br.readLine()) != null) {

    // Iterate over lines and search for ip address patterns
    String[] addressesFoundInLine = ...;


    for(String ip: addressesFoundInLine ) {

        // Did you already have that address in your file earlier? If yes, increase its counter by 
        if(occurrences.containsKey(ip))
            occurrences.put(ip, occurrences.get(ip)+1);

        // If not, create a new entry for this address
        else
            occurrences.put(ip, 1);
    } 
}


// TreeMaps are automatically orered if their elements implement 'Comparable' which is the case for strings and integers
TreeMap<Integer, ArrayList<String>> turnedAround = new TreeMap<Integer, ArrayList<String>>();

Set<Entry<String, Integer>> es = occurrences.entrySet();

// Switch keys and values of HashMap and create a new TreeMap (in case there are two ips with the same count, add them to a list)
for(Entry<String, Integer> en: es) {

    if(turnedAround.containsKey(en.getValue()))         
        turnedAround.get(en.getValue()).add((String) en.getKey());
    else {
        ArrayList<String> ips = new ArrayList<String>();
        ips.add(en.getKey());
        turnedAround.put(en.getValue(), ips);
    }

}

// Print out the values (if there are two ips with the same counts they are printed out without an special order, that would require another sorting step)
for(Entry<Integer, ArrayList<String>> entry: turnedAround.entrySet()) {         
    for(String s: entry.getValue())
        System.out.println(s + " - " + entry.getKey());         
}

In my case the output was the following:

192.168.1.19 - 4
192.168.1.18 - 7
192.168.1.27 - 19
192.168.1.13 - 19
192.168.1.12 - 28

I answered this question about half an hour ago and I guess that is exactly what you are searching for, so if you need some example code, take a look at it.

Upvotes: 1

Wes Cumberland
Wes Cumberland

Reputation: 1338

Here is some code that uses a HashMap to store the IPs and a regex to match them in each line. It uses try-with-resources to automatically close the file.

EDIT: I added code to print in descending order like you asked in the other answer.

    void read(String fileName) throws IOException {
    //Step 1 find and register IPs and store their occurence counts
    HashMap<String, Integer> ipAddressCounts = new HashMap<>();
    try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)))) {
        Pattern findIPAddrPattern = Pattern.compile("((\\d+.){3}\\d+)");
        String line;
        while ((line = br.readLine()) != null) {
            Matcher matcher = findIPAddrPattern.matcher(line);
            while (matcher.find()) {
                String ipAddr = matcher.group(0);
                if ( ipAddressCounts.get(ipAddr) == null ) {
                    ipAddressCounts.put(ipAddr, 1);
                }
                else {
                    ipAddressCounts.put(ipAddr, ipAddressCounts.get(ipAddr) + 1);
                }
            }
        }
    }

    //Step 2 reverse the map to store IPs by their frequency
    HashMap<Integer, HashSet<String>> countToAddrs = new HashMap<>();
    for (Map.Entry<String, Integer> entry : ipAddressCounts.entrySet()) {
        Integer count = entry.getValue();
        if ( countToAddrs.get(count) == null )
            countToAddrs.put(count, new HashSet<String>());
        countToAddrs.get(count).add(entry.getKey());
    }

    //Step 3 sort and print the ip addreses, most frequent first
    ArrayList<Integer> allCounts = new ArrayList<>(countToAddrs.keySet());
    Collections.sort(allCounts, Collections.reverseOrder());
    for (Integer count : allCounts) {
        for (String ip : countToAddrs.get(count)) {
            System.out.println("ip, count: " + ip + " , " + count);
        }
    }
}

Upvotes: 0

Related Questions