Reputation: 33
I have a problem with my code. I need to do several operations on a log file with this structure:
190.12.1.100 2011-03-02 12:12 test.html
190.12.1.100 2011-03-03 13:18 data.html
128.33.100.1 2011-03-03 15:25 test.html
128.33.100.1 2011-03-04 18:30 info.html
I need to get the number of visits per month, number of visits per page and number of unique visitors based on the IP. That is not the question, I managed to get all three operations working. The problem is, only the first choice runs correctly while the other choices just return values of 0 afterwards, as if the file is empty, so i am guessing i made a mistake with the I/O somewhere. Here's the code:
import java.io.*;
import java.util.*;
public class WebServerAnalyzer {
private Map<String, Integer> hm1;
private Map<String, Integer> hm2;
private int[] months;
private Scanner input;
public WebServerAnalyzer() throws IOException {
hm1 = new HashMap<String, Integer>();
hm2 = new HashMap<String, Integer>();
months = new int[12];
for (int i = 0; i < 12; i++) {
months[i] = 0;
}
File file = new File("webserver.log");
try {
input = new Scanner(file);
} catch (FileNotFoundException fne) {
input = null;
}
}
public String nextLine() {
String line = null;
if (input != null && input.hasNextLine()) {
line = input.nextLine();
}
return line;
}
public int getMonth(String line) {
StringTokenizer tok = new StringTokenizer(line);
if (tok.countTokens() == 4) {
String ip = tok.nextToken();
String date = tok.nextToken();
String hour = tok.nextToken();
String page = tok.nextToken();
StringTokenizer dtok = new StringTokenizer(date, "-");
if (dtok.countTokens() == 3) {
String year = dtok.nextToken();
String month = dtok.nextToken();
String day = dtok.nextToken();
int m = Integer.parseInt(month);
return m;
}
}
return -1;
}
public String getIP(String line) {
StringTokenizer tok = new StringTokenizer(line);
if (tok.countTokens() == 4) {
String ip = tok.nextToken();
String date = tok.nextToken();
String hour = tok.nextToken();
String page = tok.nextToken();
StringTokenizer dtok = new StringTokenizer(date, "-");
return ip;
}
return null;
}
public String getPage(String line) {
StringTokenizer tok = new StringTokenizer(line);
if (tok.countTokens() == 4) {
String ip = tok.nextToken();
String date = tok.nextToken();
String hour = tok.nextToken();
String page = tok.nextToken();
StringTokenizer dtok = new StringTokenizer(date, "-");
return page;
}
return null;
}
public void visitsPerMonth() {
String line = null;
do {
line = nextLine();
if (line != null) {
int m = getMonth(line);
if (m != -1) {
months[m - 1]++;
}
}
} while (line != null);
// Print the result
String[] monthName = {"JAN ", "FEB ", "MAR ",
"APR ", "MAY ", "JUN ", "JUL ", "AUG ", "SEP ",
"OCT ", "NOV ", "DEC "};
for (int i = 0; i < 12; i++) {
System.out.println(monthName[i] + months[i]);
}
}
public int count() throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream("webserver.log"));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
}
return count;
} finally {
is.close();
}
}
public void UniqueIP() throws IOException{
String line = null;
for (int x = 0; x <count(); x++){
line = nextLine();
if (line != null) {
if(hm1.containsKey(getIP(line)) == false) {
hm1.put(getIP(line), 1);
} else {
hm1.put(getIP(line), hm1.get(getIP(line)) +1 );
}
}
}
Set set = hm1.entrySet();
Iterator i = set.iterator();
System.out.println("\nNumber of unique visitors: " + hm1.size());
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
System.out.print(me.getKey() + " - ");
System.out.println(me.getValue() + " visits");
}
}
public void pageVisits() throws IOException{
String line = null;
for (int x = 0; x <count(); x++){
line = nextLine();
if (line != null) {
if(hm2.containsKey(getPage(line)) == false)
hm2.put(getPage(line), 1);
else
hm2.put(getPage(line), hm2.get(getPage(line)) +1 );
}
}
Set set = hm2.entrySet();
Iterator i = set.iterator();
System.out.println("\nNumber of pages visited: " + hm2.size());
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
System.out.print(me.getKey() + " - ");
System.out.println(me.getValue() + " visits");
}
}
Any help figuring out the problem would be much appreciated as I am quite stuck.
Upvotes: 0
Views: 250
Reputation: 34014
The reset
method of BufferedReader
that Thomas recommended would only work if the file size is smaller than the buffer size or if you called mark with a large enough read ahead limit.
I would recommend reading throught the file once and to update your maps and month array for each line. BTW, you don't need a Scanner just to read lines, BufferedReader has a readLine
method itself.
BufferedReader br = ...;
String line;
while (null != (line = br.readLine())) {
String ip = getIP(line);
String page = getPage(line);
int month = getMonth(line);
// update hashmaps and arrays
}
Upvotes: 2
Reputation: 88707
I didn't read the code thoroughly yet, but I guess you're not setting the read position back to the beginning of the file when you start a new operation. Thus nextLine()
would return null.
You should create a new Scanner for each operation and close it afterwards. AFAIK scanner doesn't provide a method to go back to the first byte.
Currently I could also think of 3 alternatives:
Use a BufferedReader
and call reset()
for each new operation. This should cause the reader to go back to byte 0 provided you didn't call mark()
somewhere.
Read the file contents once and iterate over the lines in memory, i.e. put all lines into a List<String>
and then start at each line.
Read the file once, parse each line and construct an apropriate data structure that contains the data you need. For example, you could use a TreeMap<Date, Map<Page, Map<IPAdress, List<Visit>>>>
, i.e. you'd store the visits per ip address per page for each date. You could then select the appropriate submaps by date, page and ip address.
Upvotes: 4