Reputation: 43
I'm trying to loop through a log text file, containing SSH logins and other logs.
The program is returning the total number of SSH logins.
My solution does work but seems a bit slow (~3.5 sec on a 200mo file). I would like to know if there are any ways to make it faster. I'm not really familiar with good practices on Java.
I'm using the BufferedReader
class. Maybe there are better classes/methods but everything else I found online was slower.
{
BufferedReader br;
if(fileLocation != null) {
br = new BufferedReader(new FileReader(fileLocation));
}
else {
br = new BufferedReader((new InputStreamReader(System.in, "UTF-8")));
}
String line;
Stack<String> users = new Stack<>();
int succeeded = 0;
int failed;
int total = 0;
if(!br.ready()) {
help("Cannot read the file", true);
}
while((line=br.readLine())!=null)
{
if(!line.contains("sshd")) continue;
String[] arr = line.split("\\s+");
if(arr.length < 11) continue;
String log = arr[4];
String log2 = arr[5];
String log3 = arr[8];
String user = arr[10];
if(!log.contains("sshd")) continue;
if(!log2.contains("Accepted")) {
if(log3.contains("failure")) {
total++;
}
continue;
}
total++;
succeeded++;
if(!repeat) {
if (users.contains(user)) continue;
users.add(user);
}
System.out.println((total + 1) + " " + user);
}
Full code : https://pastebin.com/xp2P9wja
Also, here's some lines of the log file :
Dec 3 12:20:12 k332 sshd[25206]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.147.222.137
Dec 3 12:20:14 k332 sshd[25204]: error: PAM: Authentication failure for illegal user admin from 10.147.222.137
Dec 3 12:20:14 k332 sshd[25204]: Failed keyboard-interactive/pam for invalid user admin from 10.147.222.137 port 36417 ssh2
Dec 3 12:20:14 k332 sshd[25204]: Connection closed by invalid user admin 10.147.222.137 port 36417 [preauth]
Dec 3 12:20:40 k332 sshd[25209]: pam_tally2(sshd:auth): Tally overflowed for user root
Final output is :
Total :
103 unique IP SSH logins succeeded
30387 SSH logins succeeded
17186 SSH logins failed
47573 total SSH logins
Thanks for your time!
EDIT: Mo (Mega Octet) = MB (Mega Byte) (we usually say Mo in french)
Here's the full updated code is anyone needs it : https://pastebin.com/Kn5EqLNX
Upvotes: 3
Views: 389
Reputation: 230
If you get a profile of your code, it becomes clear that the problem is in the String.split() method:
This is a known problem in the standard Java library: Java split String performances.
So in order to speed up your code, you need to speed up this part of the code in some way. The first thing I can suggest is to replace the code on lines 75-79 with this:
Pattern pattern = Pattern.compile("\\s+");
while ((line = br.readLine()) != null) {
if (!line.contains("sshd")) continue;
String[] arr = pattern.split(line);
if (arr.length < 11) continue;
...
}
This may speed up the code a bit, but you can see from the profile that a lot of time is still spent in Pattern and Matcher methods. We need to get rid of Pattern and Matcher for a significant speedup.
For single-character patterns split works without using Regex and does it quite efficiently, let's try replacing the code with:
while ((line = br.readLine()) != null) {
if (!line.contains("sshd")) continue;
String[] arr = Arrays.stream(line.split(" "))
.filter(s -> !s.isEmpty())
.toArray(String[]::new);
if (arr.length < 11) continue;
...
}
This code runs almost twice as fast on the same data.
Upvotes: 6