Nick
Nick

Reputation: 892

Java: Marshalling using JaxB to XML, how to properly multithread

I am trying to take a very long file of strings and convert it to an XML according to a schema I was given. I used jaxB to create classes from that schema. Since the file is very large I created a thread pool to improve the performance but since then it only processes one line of the file and marshalls it to the XML file, per thread.

Below is my home class where I read from the file. Each line is a record of a transaction, for every new user encountered a list is made to store all of that users transactions and each list is put into a HashMap. I made it a ConcurrentHashMap because multiple threads will work on the map simultaneously, is this the correct thing to do?

After the lists are created a thread is made for each user. Each thread runs the method ProcessCommands below and receives from home the list of transactions for its user.

public class home{
  public static File XMLFile = new File("LogFile.xml");
  Map<String,List<String>> UserMap= new ConcurrentHashMap<String,List<String>>();
  String[] UserNames =  new String[5000];
    int numberOfUsers = 0;
    try{
        BufferedReader reader = new BufferedReader(new FileReader("test.txt"));
            String line;
            while ((line = reader.readLine()) != null)
            {
                parsed = line.split(",|\\s+");
                if(!parsed[2].equals("./testLOG")){
                    if(Utilities.checkUserExists(parsed[2], UserNames) == false){ //User does not already exist
                        System.out.println("New User: " + parsed[2]);
                        UserMap.put(parsed[2],new ArrayList<String>());         //Create list of transactions for new user
                        UserMap.get(parsed[2]).add(line);                       //Add First Item to new list
                        UserNames[numberOfUsers] = parsed[2];                   //Add new user
                        numberOfUsers++;
                    }
                    else{                                                           //User Already Existed
                        UserMap.get(parsed[2]).add(line);
                    }
                }
            }
            reader.close();
    } catch (IOException x) {
        System.err.println(x);
    }

    //get start time
    long startTime = new Date().getTime();
    tCount = numberOfUsers;
    ExecutorService threadPool = Executors.newFixedThreadPool(tCount);
    for(int i = 0; i < numberOfUsers; i++){
        System.out.println("Starting Thread " + i + " for user " + UserNames[i]);
        Runnable worker = new ProcessCommands(UserMap.get(UserNames[i]),UserNames[i], XMLfile);
        threadPool.execute(worker);
    }
    threadPool.shutdown();
    while(!threadPool.isTerminated()){

    }
    System.out.println("Finished all threads");

}

Here is the ProcessCommands class. The thread receives the list for its user and creates a marshaller. From what I unserstand marshalling is not thread safe so it is best to create one for each thread, is this the best way to do that?

When I create the marshallers I know that each from (from each thread) will want to access the created file causing conflicts, I used synchronized, is that correct?

As the thread iterates through it's list, each line calls for a certain case. There are a lot so I just made pseudo-cases for clarity. Each case calls the function below.

public class ProcessCommands implements Runnable{
private static final boolean DEBUG = false;
private List<String> list = null;
private String threadName;
private File XMLfile = null;
public Thread myThread;


public ProcessCommands(List<String> list, String threadName, File XMLfile){
    this.list = list;
    this.threadName = threadName;
    this.XMLfile = XMLfile;
}

public void run(){
    Date start = null;
    int transactionNumber = 0;
    String[] parsed = new String[8];
    String[] quoteParsed = null;
    String[] universalFormatCommand = new String[9];
    String userCommand = null;
    Connection connection = null;
    Statement stmt = null;
    Map<String, UserObject> usersMap = null;
    Map<String, Stack<BLO>> buyMap = null;
    Map<String, Stack<SLO>> sellMap = null;
    Map<String, QLO> stockCodeMap = null;
    Map<String, BTO> buyTriggerMap = null;
    Map<String, STO> sellTriggerMap = null;
    Map<String, USO> usersStocksMap = null;
    String SQL = null;
    int amountToAdd = 0;
    int tempDollars = 0;
    UserObject tempUO = null;
    BLO tempBLO = null;
    SLO tempSLO = null;
    Stack<BLO> tempStBLO = null;
    Stack<SLO> tempStSLO = null;
    BTO tempBTO = null;
    STO tempSTO = null;
    USO tempUSO = null;
    QLO tempQLO = null;
    String stockCode = null;
    String quoteResponse = null;
    int usersDollars = 0;
    int dollarAmountToBuy = 0;
    int dollarAmountToSell = 0;
    int numberOfSharesToBuy = 0;
    int numberOfSharesToSell = 0;
    int quoteStockInDollars = 0;
    int shares = 0;
    Iterator<String> itr = null;

    int transactionCount = list.size();
    System.out.println("Starting "+threadName+" - listSize = "+transactionCount);

    //UO dollars, reserved
    usersMap  = new HashMap<String, UserObject>(3);  //userName -> UO

    //USO shares
    usersStocksMap = new HashMap<String, USO>(); //userName+stockCode -> shares

    //BLO code, timestamp, dollarAmountToBuy, stockPriceInDollars
    buyMap = new HashMap<String, Stack<BLO>>();  //userName -> Stack<BLO>

    //SLO code, timestamp, dollarAmountToSell, stockPriceInDollars
    sellMap = new HashMap<String, Stack<SLO>>();  //userName -> Stack<SLO>

    //BTO code, timestamp, dollarAmountToBuy, stockPriceInDollars
    buyTriggerMap = new ConcurrentHashMap<String, BTO>();  //userName+stockCode -> BTO

    //STO code, timestamp, dollarAmountToBuy, stockPriceInDollars
    sellTriggerMap = new HashMap<String, STO>();  //userName+stockCode -> STO

    //QLO timestamp, stockPriceInDollars
    stockCodeMap = new HashMap<String, QLO>();  //stockCode -> QLO



    //create user object and initialize stacks
    usersMap.put(threadName, new UserObject(0, 0));
    buyMap.put(threadName, new Stack<BLO>());
    sellMap.put(threadName, new Stack<SLO>());
    try {
        //Marshaller marshaller = getMarshaller();
        synchronized (this){
            Marshaller marshaller = init.jc.createMarshaller();
            marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
            marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
            marshaller.marshal(LogServer.Root,XMLfile);
            marshaller.marshal(LogServer.Root,System.out);
        }
    } catch (JAXBException M) {
        M.printStackTrace();
    }

    Date timing = new Date();
    //universalFormatCommand = new String[8];
    parsed = new String[8];
    //iterate through workload file
    itr = this.list.iterator();
    while(itr.hasNext()){
        userCommand = (String) itr.next(); 
        itr.remove();
        parsed = userCommand.split(",|\\s+");
        transactionNumber = Integer.parseInt(parsed[0].replaceAll("\\[", "").replaceAll("\\]", ""));
        universalFormatCommand = Utilities.FormatCommand(parsed, parsed[0]);
        if(transactionNumber % 100 == 0){
            System.out.println(this.threadName + " - " +transactionNumber+ " - "+(new Date().getTime() - timing.getTime())/1000);
        }
        /*System.out.print("UserCommand " +transactionNumber + ": ");
        for(int i = 0;i<8;i++)System.out.print(universalFormatCommand[i]+ " ");
        System.out.print("\n");*/
        //switch for user command
        switch (parsed[1].toLowerCase()) {

        case "One"
            *Do Stuff"
            LogServer.create_Log(universalFormatCommand, transactionNumber, CommandType.ADD);
            break;
        case "Two"
            *Do Stuff"
            LogServer.create_Log(universalFormatCommand, transactionNumber, CommandType.ADD);
            break;
        }
     }
  }

The function create_Log has multiple cases so as before, for clarity I just left one. The case "QUOTE" only calls one object creation function but other other cases can create multiple objects. The type 'log' is a complex XML type that defines all the other object types so in each call to create_Log I create a log type called Root. The class 'log' generated by JaxB included a function to create a list of objects. The statement:

Root.getUserCommandOrQuoteServerOrAccountTransaction().add(quote_QuoteType);

takes the root element I created, creates a list and adds the newly created object 'quote_QuoteType' to that list. Before I added threading this method successfully created a list of as many objects as I wanted then marshalled them. So I'm pretty positive the bit in class 'LogServer' is not the issue. It is something to do with the marshalling and syncronization in the ProcessCommands class above.

public class LogServer{
    public static log Root = new log();

    public static QuoteServerType Log_Quote(String[] input, int TransactionNumber){
    ObjectFactory factory = new ObjectFactory();
    QuoteServerType quoteCall = factory.createQuoteServerType();

    **Populate the QuoteServerType object called quoteCall**

    return quoteCall;
    }

    public static void create_Log(String[] input, int TransactionNumber, CommandType Command){
    System.out.print("TRANSACTION "+TransactionNumber + " is " + Command + ": ");
    for(int i = 0; i<input.length;i++) System.out.print(input[i] + " ");
    System.out.print("\n");
    switch(input[1]){
    case "QUOTE":
        System.out.print("QUOTE CASE");
        QuoteServerType quote_QuoteType = Log_Quote(input,TransactionNumber);
        Root.getUserCommandOrQuoteServerOrAccountTransaction().add(quote_QuoteType);
        break;
        }
      }

Upvotes: 0

Views: 1700

Answers (1)

Zielu
Zielu

Reputation: 8552

So you wrote a lot of code, but have you try if it is actually working? After quick look I doubt it. You should test your code logic part by part not going all the way till the end. It seems you are just staring with Java. I would recommend practice first on simple one threaded applications. Sorry if I sound harsh, but I will try to be constructive as well:

  1. Per convention, the classes names are starts with capital letter, variables by small, you do it other way.
  2. You should make a method in you home (Home) class not a put all your code in the static block.
  3. You are reading the whole file to the memory, you do not process it line by line. After the Home is initialized literary whole content of file will be under UserMap variable. If the file is really large you will run out of the heap memory. If you assume large file than you cannot do it and you have to redisign your app to store somewhere partial results. If your file is smaller than memmory you could keep it like that (but you said it is large).
  4. No need for UserNames, the UserMap.containsKey will do the job
  5. Your thread pools size should be in the range of your cores not number of users as you will get thread trashing (if you have blocking operation in your code make tCount = 2*processors if not keep it as number of processors). Once one ProcessCommand finish, the executor will start another one till you finish all and you will be efficiently using all your processor cores.
  6. DO NOT while(!threadPool.isTerminated()), this line will completely consume one processor as it will be constantly checking, call awaitTermination instead
  7. Your ProcessCommand, has view map variables which will only had one entry cause as you said, each will process data from one user.
  8. The synchronized(this) is Process will not work, as each thread will synchronized on different object (different isntance of process).
  9. I believe creating marshaller is thread safe (check it) so no need to synchronization at all
  10. You save your log (whatever it is) before you did actual processing in of the transactions lists
  11. The marshalling will override content of the file with current state of LogServer.Root. If it is shared bettween your proccsCommand (seems so) what is the point in saving it in each thread. Do it once you are finished.
  12. You dont need itr.remove();
  13. The log class (for the ROOT variable !!!) needs to be thread-safe as all the threads will call the operations on it (so the list inside the log class must be concurrent list etc).
  14. And so on.....

I would recommend, to

  1. Start with simple one thread version that actually works.
  2. Deal with processing line by line, (store reasults for each users in differnt file, you can have cache with transactions for recently used users so not to keep writing all the time to the disk (see guava cache)
  3. Process multithreaded each user transaction to your user log objects (again if it is a lot you have to save them to the disk not keep all in memmory).
  4. Write code that combines logs from diiffernt users to create one (again you may want to do it mutithreaded), though it will be mostly IO operations so not much gain and more tricky to do.

Good luck override cont

Upvotes: 2

Related Questions