arsenal
arsenal

Reputation: 24154

Parsing a .txt file (considering performance measure)

DurationOfRun:5
ThreadSize:10
ExistingRange:1-1000
NewRange:5000-10000
Percentage:55 - AutoRefreshStoreCategories  Data:Previous/30,New/70    UserLogged:true/50,false/50      SleepTime:5000     AttributeGet:1,16,10106,10111       AttributeSet:2060/30,10053/27
Percentage:25 - CrossPromoEditItemRule      Data:Previous/60,New/40    UserLogged:true/50,false/50      SleepTime:4000     AttributeGet:1,10107                AttributeSet:10108/34,10109/25
Percentage:20 - CrossPromoManageRules       Data:Previous/30,New/70    UserLogged:true/50,false/50      SleepTime:2000     AttributeGet:1,10107                AttributeSet:10108/26,10109/21

I am trying to parse above .txt file(first four lines are fixed and last three Lines can increase means it can be more than 3), so for that I wrote the below code and its working but it looks so messy. so Is there any better way to parse the above .txt file and also if we consider performance then which will be best way to parse the above txt file.

private static int noOfThreads;
private static List<Command> commands;
public static int startRange;
public static int endRange;
public static int newStartRange;
public static int newEndRange;
private static BufferedReader br = null;
private static String sCurrentLine = null;
private static List<String> values;
private static String commandName;
private static String percentage;
private static List<String> attributeIDGet;
private static List<String> attributeIDSet;
private static LinkedHashMap<String, Double> dataCriteria;
private static LinkedHashMap<Boolean, Double> userLoggingCriteria;
private static long sleepTimeOfCommand;
private static long durationOfRun;

br = new BufferedReader(new FileReader("S:\\Testing\\PDSTest1.txt"));
values = new ArrayList<String>();

while ((sCurrentLine = br.readLine()) != null) {
    if(sCurrentLine.startsWith("DurationOfRun")) {
        durationOfRun = Long.parseLong(sCurrentLine.split(":")[1]);
    } else if(sCurrentLine.startsWith("ThreadSize")) {
        noOfThreads = Integer.parseInt(sCurrentLine.split(":")[1]);
    } else if(sCurrentLine.startsWith("ExistingRange")) {
        startRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
        endRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
    } else if(sCurrentLine.startsWith("NewRange")) {
        newStartRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[0]);
        newEndRange = Integer.parseInt(sCurrentLine.split(":")[1].split("-")[1]);
    } else {
        attributeIDGet =  new ArrayList<String>();
        attributeIDSet =  new ArrayList<String>();
        dataCriteria = new LinkedHashMap<String, Double>();
        userLoggingCriteria = new LinkedHashMap<Boolean, Double>();

        percentage = sCurrentLine.split("-")[0].split(":")[1].trim();
        values = Arrays.asList(sCurrentLine.split("-")[1].trim().split("\\s+"));
        for(String s : values) {
            if(s.startsWith("Data")) {
                String[] data = s.split(":")[1].split(",");
                for (String n : data) {
                    dataCriteria.put(n.split("/")[0], Double.parseDouble(n.split("/")[1]));
                }
                //dataCriteria.put(data.split("/")[0], value)
            } else if(s.startsWith("UserLogged")) {
                String[] userLogged = s.split(":")[1].split(",");
                for (String t : userLogged) {
                    userLoggingCriteria.put(Boolean.parseBoolean(t.split("/")[0]), Double.parseDouble(t.split("/")[1]));
                }
                //userLogged = Boolean.parseBoolean(s.split(":")[1]);
            } else if(s.startsWith("SleepTime")) {
                sleepTimeOfCommand = Long.parseLong(s.split(":")[1]);
            } else if(s.startsWith("AttributeGet")) {
                String[] strGet = s.split(":")[1].split(",");
                for(String q : strGet) attributeIDGet.add(q); 
            } else if(s.startsWith("AttributeSet:")) {
                String[] strSet = s.split(":")[1].split(",");
                for(String p : strSet) attributeIDSet.add(p); 
            } else {
                commandName = s;
            }
        }
        Command command = new Command();
        command.setName(commandName);
        command.setExecutionPercentage(Double.parseDouble(percentage));
        command.setAttributeIDGet(attributeIDGet);
        command.setAttributeIDSet(attributeIDSet);
        command.setDataUsageCriteria(dataCriteria);
        command.setUserLoggingCriteria(userLoggingCriteria);
        command.setSleepTime(sleepTimeOfCommand);
        commands.add(command);

Upvotes: 0

Views: 352

Answers (3)

paxdiablo
paxdiablo

Reputation: 881423

Well, parsers usually are messy once you get down to the lower layers of them :-)

However, one possible improvement, at least in terms of code quality, would be to recognize the fact that your grammar is layered.

By that, I mean every line is an identifying token followed by some properties.

In the case of DurationOfRun, ThreadSize, ExistingRange and NewRange, the properties are relatively simple. Percentage is somewhat more complex but still okay.

I would structure the code as (pseudo-code):

def parseFile (fileHandle):
    while (currentLine = fileHandle.getNextLine()) != EOF:
        if currentLine.beginsWith ("DurationOfRun:"):
            processDurationOfRun (currentLine[14:])

        elsif currentLine.beginsWith ("ThreadSize:"):
            processThreadSize (currentLine[11:])

        elsif currentLine.beginsWith ("ExistingRange:"):
            processExistingRange (currentLine[14:])

        elsif currentLine.beginsWith ("NewRange:"):
            processNewRange (currentLine[9:])

        elsif currentLine.beginsWith ("Percentage:"):
            processPercentage (currentLine[11:])

        else
            raise error

Then, in each of those processWhatever() functions, you parse the remainder of the line based on the expected format. That keeps your code small and readable and easily changed in future, without having to navigate a morass :-)

For example, processDurationOfRun() simply gets an integer from the remainder of the line:

def processDurationOfRun (line):
    this.durationOfRun = line.parseAsInt()

Similarly, the functions for the two ranges split the string on - and get two integers from the resultant values:

def processExistingRange (line):
    values[] = line.split("-")
    this.existingRangeStart = values[0].parseAsInt()
    this.existingRangeEnd   = values[1].parseAsInt()

The processPercentage() function is the tricky one but that is also easily doable if you layer it as well. Assuming those things are always in the same order, it consists of:

  • an integer;
  • a literal -;
  • some sort of textual category; and
  • a series of key:value pairs.

And even these values within the pairs can be parsed by lower levels, splitting first on commas to get subvalues like Previous/30 and New/70, then splitting each of those subvalues on slashes to get individual items. That way, a logical hierarchy can be reflected in your code.

Unless you're expecting to be parsing this text files many times per second, or unless it's many megabytes in size, I'd be more concerned about the readability and maintainability of your code than the speed of the parsing.

Mostly gone are the days when we need to wring the last ounce of performance from our code but we still have problems in fixing said code in a timely manner when bugs are found or enhancements are desired.

Sometimes it's preferable to optimise for readability.

Upvotes: 2

user unknown
user unknown

Reputation: 36229

The younger and more convenient class is Scanner. You just need to modify the delimiter, and get reading of data in the desired format (readInt, readLong) in one go - no need for separate x.parseX - calls.

Second: Split your code into small, reusable pieces. They make the program readable, and you can hide details easily.

Don't hesitate to use a struct-like class for a range, for example. Returning multiple values from a method can be done by these, without boilerplate (getter,setter,ctor).

import java.util.*;
import java.io.*;

public class ReadSampleFile
{
    // struct like classes:
    class PercentageRow {
        public int percentage;
        public String name;
        public int dataPrevious;
        public int dataNew;
        public int userLoggedTrue;
        public int userLoggedFalse;
        public List<Integer> attributeGet;
        public List<Integer> attributeSet;
    }
    class Range {
        public int from;
        public int to;
    }

    private int readInt (String name, Scanner sc) {     
        String s = sc.next (); 
        if (s.startsWith (name)) {
            return sc.nextLong ();
        }
        else err (name + " expected, found: " + s);     
    }

    private long readLong (String name, Scanner sc) {
        String s = sc.next (); 
        if (s.startsWith (name)) {
            return sc.nextInt ();
        }
        else err (name + " expected, found: " + s);     
    }

    private Range readRange (String name, Scanner sc) {
        String s = sc.next (); 
        if (s.startsWith (name)) {
            Range r = new Range ();
            r.from = sc.nextInt ();
            r.to = sc.nextInt ();
            return r; 
        }
        else err (name + " expected, found: " + s);
    }

    private PercentageLine readPercentageLine (Scanner sc) {
        // reuse above methods
        PercentageLine percentageLine = new PercentageLine ();
        percentageLine.percentage = readInt ("Percentage", sc);
        // ...
        return percentageLine;
    }

    public ReadSampleFile () throws FileNotFoundException
    {       
        /* I only read from my sourcefile for convenience. 
        So I could scroll up to see what's the next entry.                  
        Don't do this at home. :) The dummy later ...
        */ 
        Scanner sc = new Scanner (new File ("./ReadSampleFile.java"));
        sc.useDelimiter ("[ \n/,:-]");
        // ... is the comment I had to insert.
        String dummy = sc.nextLine (); 
        List <String> values = new ArrayList<String> ();
        if (sc.hasNext ()) {
            // see how nice the data structure is reflected 
            // by this code:  
            long duration = readLong ("DurationOfRun");         
            int noOfThreads = readInt ("ThreadSize");
            Range eRange = readRange ("ExistingRange");
            Range nRange = readRange ("NewRange");
            List <PercentageRow> percentageRows = new ArrayList <PercentageRow> ();
            // including the repetition ...
            while (sc.hasNext ()) {
                percentageRows.add (readPercentageLine ()); 
            }
        }
    }

    public static void main (String args[])  throws FileNotFoundException
    {
        new ReadSampleFile ();
    }

    public static void err (String msg)
    {
        System.out.println ("Err:\t" + msg);
    }
}

Upvotes: 0

Tony Ennis
Tony Ennis

Reputation: 12299

I would not worry about performance until I was sure there was actually a performance issue. Regarding the rest of the code, if you won't be adding any new line types I would not worry about it. If you do worry about it, however, a factory design pattern can help you separate the selection of the type of processing needed from the actual processing. It makes adding new line types easier without introducing as much opportunity for error.

Upvotes: 1

Related Questions