texasdude11
texasdude11

Reputation: 887

How to parse raw log file dump and extracting relevant information

Parsing raw log file dump and extracting relevant information.

Following is the log file extract for two events. Each event is seperated by ** as seen below:

01/23/13 17:29:25 |-| *****************************************************************
01/23/13 17:29:25 |-| MTCLPrint: processCodesoftPrintRequest() [-WEB--JU-LBCH-Wed Jan 23 17:29:25 CST 2013]...
01/23/13 17:29:25 |-|   ==> CS Label: [JU , LBCH , 70005023489 , JU Filtrete UPC0 3 Up Label , JU Filtrete UPC0 Labels 3 up , JUCHIZ01 , Qty: 1100]
01/23/13 17:29:26 |-| TrkgNbr: [-WEB--JU-LBCH-Wed Jan 23 17:29:25 CST 2013] ,CSPid: 5372
01/23/13 17:29:27 |-| Sending print job to -\\JUFP01\JUCHIZ01 [-WEB--JU-LBCH-Wed Jan 23 17:29:25 CST 2013]...
01/23/13 17:29:29 |-| Cannot delete file in temp dir (mergeAndPrint) - P786406707_67724_818342796.prn
01/23/13 17:29:29 |-| MTCLPrint: processCodesoftPrintRequest() [-WEB--JU-LBCH-Wed Jan 23 17:29:25 CST 2013]...complete.
01/23/13 17:29:29 |-| ~~~~ MTCL Print Web Service is terminated. ~~~~
01/23/13 17:29:56 |-| ~~~~ MTCL Print Web Service is started. ~~~~
01/23/13 17:29:56 |-| *****************************************************************
01/23/13 17:29:56 |-| MTCLPrint: processCodesoftPrintRequest() [-WEB--SDL-P1-Wed Jan 23 17:29:56 CST 2013]...
01/23/13 17:29:56 |-|   ==> CS Label: [SDL , P1 , 70000437403 , SDL GenericShip.CS7Z170 10-2006 REV , Medina HD Two Part Inter. Label , Z170 Packer , Qty: 1]
01/23/13 17:29:56 |-| TrkgNbr: [-WEB--SDL-P1-Wed Jan 23 17:29:56 CST 2013] ,CSPid: 8840
01/23/13 17:29:58 |-| Sending print job to -\\SPPRT10\SDL-PR-Zebra03 [-WEB--SDL-P1-Wed Jan 23 17:29:56 CST 2013]...
01/23/13 17:29:58 |-| Cannot delete file in temp dir (mergeAndPrint) - P1905794774_98669_986327948.prn
01/23/13 17:29:58 |-| MTCLPrint: processCodesoftPrintRequest() [-WEB--SDL-P1-Wed Jan 23 17:29:56 CST 2013]...complete.
01/23/13 17:29:58 |-| ~~~~ MTCL Print Web Service is terminated. ~~~~
01/23/13 17:30:11 |-| ~~~~ MTCL Print Web Service is started. ~~~~

There is a series of similar repeating logs being dump by application log. I need to parse this file and enter it in MS Excel by a Java program.

Following is the relevant that I need to picked up from above raw data dump:

01/23/13 17:29:25 |-|   ==> CS Label: [JU , LBCH , 70005023489 , JU Filtrete UPC0 3 Up Label , JU Filtrete UPC0 Labels 3 up , JUCHIZ01 , Qty: 1100]

01/23/13 17:29:56 |-|   ==> CS Label: [SDL , P1 , 70000437403 , SDL GenericShip.CS7Z170 10-2006 REV , Medina HD Two Part Inter. Label , Z170 Packer , Qty: 1]

This raw data always starts with "CS Label:" and followed by a "[" then there are 7 fields separated by commas. I need to extract these seven fields into columns in an excel sheet. What can be the most efficient way to accomplish this?

Upvotes: 0

Views: 1363

Answers (2)

texasdude11
texasdude11

Reputation: 887

Here you go:

public class DateFunctions {
    // Date related Logic
    private Date startDate;
    private Integer delta;

    @SuppressWarnings("resource")
    public DateFunctions() {
        // ApplicationContext ctx = new
        // ClassPathXmlApplicationContext("date-properties.xml");
        ApplicationContext ctx = new FileSystemXmlApplicationContext(
                "c:\\logAnalyzer\\date-properties.xml");
        // Initializing date related data from spring configuration file
        startDate = (Date) ctx.getBean("startDate");
        delta = (Integer) ctx.getBean("delta");
    }

    public boolean isTimeDiffAcceptable(Date logDate) {

        DateTime dt1, dt2;
        Integer minutesDiff;

        dt1 = new DateTime(logDate);
        dt2 = new DateTime(startDate);

        int daysDiff = Days.daysBetween(dt1, dt2).getDays();

        if (daysDiff == 0) {
            minutesDiff = dt1.getMinuteOfDay() - dt2.getMinuteOfDay();
        } else {
            minutesDiff = 0;
        }

        // System.out.println("minutesDiff : " + minutesDiff);
        // System.out.println("delta : " + delta);

        if (minutesDiff < 0 && (-1 * minutesDiff) < delta) {
            return true;
        } else {
            return false;
        }
    }

    public Date getStartDate() {
        return startDate;
    }

    public void setStartDate(Date startDate) {
        this.startDate = startDate;
    }

    public Integer getDelta() {
        return delta;
    }

    public void setDelta(Integer delta) {
        this.delta = delta;
    }

}

Then the format log method will let you put the logs in an excel sheet.

public class FormatLog {

    private String searchRequiredString, searchStartString, searchEndString;
    private String regex, line;
    private Integer lengthStartStr, lengthEndStr;

    private File inputFile;

    private Integer rowNum, colNum;
    private HSSFCell cell;

    private String[] ary;

    private Logger log;

    // Initializing data in constructor
    public FormatLog() {
        // initializing the logger for log4j
        log = Logger.getLogger(FormatLog.class);

        @SuppressWarnings("resource")
        ApplicationContext ctx = new ClassPathXmlApplicationContext(
                "spring.xml");

        // Initializing search String related parameters
        searchRequiredString = (String) ctx.getBean("searchRequiredString");
        searchStartString = (String) ctx.getBean("searchStartString");
        searchEndString = (String) ctx.getBean("searchEndString");
        regex = (String) ctx.getBean("regex");

        // Initializing the Log Input File
        inputFile = (File) ctx.getBean("inputFile");

        // Computing some start parameters
        lengthStartStr = searchStartString.length();
        lengthEndStr = searchEndString.length();

        // Row Number initialized to -1
        rowNum = -1;
    }

    /*
     * Input: None Output: Integer rowNum - number of rows processed.
     */
    @SuppressWarnings({ "resource", "deprecation" })
    public Integer stringManipulation() {

        // setting header text
        String[] headerText = { "Date", "Facility Name", "Work Center",
                "ID Nbr", "Preference", "Format", "Printer Name", "Qty" };
        DateFunctions df = new DateFunctions();
        try {
            FileOutputStream fileOut = new FileOutputStream(
                    "c:\\logAnalyzer\\FormattedLogsELC.xls");
            HSSFWorkbook workbook = new HSSFWorkbook();
            HSSFSheet worksheet = workbook.createSheet("ELC Log Worksheet - "
                    + df.getStartDate().toString());

            BufferedReader br = new BufferedReader(new FileReader(inputFile));
            while ((line = br.readLine()) != null) {
                // Create the StringBuilder
                StringBuilder builder = new StringBuilder(line);

                Integer indexRequired = builder.indexOf(searchRequiredString);
                if (indexRequired > 0) {

                    String date = builder.subSequence(1, 17).toString();
                    log.debug(date);
                    Date logDate = new Date(date);
                    if (df.isTimeDiffAcceptable(logDate)) {

                        log.debug(builder.toString());
                        Integer indexStart = builder.indexOf(searchStartString);
                        Integer indexEnd = builder.indexOf(searchEndString);
                        String s = builder.subSequence(
                                indexStart + lengthStartStr,
                                indexEnd - lengthEndStr + 1).toString();
                        // adding date to the string s
                        s = date + "," + s;
                        log.debug(s);
                        ary = s.split(regex);
                        if (ary.length != headerText.length) {
                            log.warn("Review row number : " + rowNum
                                    + ". Unexpected data was found");
                        }

                        // incrementing RowNumber for workbook calculation.
                        rowNum++;
                        if (rowNum == 0) {
                            log.info("Setting Header Text");
                            ary = headerText;
                        }
                        log.info("Row Number: " + rowNum
                                + " extracted for time : " + date);
                        // Create a new wb row.
                        HSSFRow row = worksheet.createRow(rowNum);
                        for (colNum = 0; colNum < ary.length; colNum++) {
                            ary[colNum] = ary[colNum].trim();
                            log.debug(ary[colNum]);
                            cell = row.createCell(colNum);
                            cell.setCellValue(ary[colNum]);
                        }// end for
                    }// end if timeDiff
                }// end if indexRequired
            } // while

            // write to workbook and close it
            workbook.write(fileOut);
            fileOut.flush();
            fileOut.close();

        } catch (FileNotFoundException e) {
            log.error(e);
        } catch (IOException e) {
            log.error(e);
        }

        if (rowNum < 0) {
            log.warn("No rows were written to the excel file.");
            log.warn("Application Logs provided may be out of the Wily alert window if no exception was thrown.");

        } else {
            log.info("Success");
        }

        // Returns the rows processed
        return rowNum;
    }
}

Upvotes: 0

Andrew Logvinov
Andrew Logvinov

Reputation: 21831

I see the following algorithm:

  1. Read the file line after line.
  2. If line contains CS Label string, process it (discard otherwise).
  3. Extract everything between square brackets (you can use regex for this, or just plain String methods).
  4. Split this string on comma, trim spaces.
  5. Add this info to some structure and publish to Excel.

Upvotes: 1

Related Questions