lightweight
lightweight

Reputation: 3327

regex serde reading log files in hive

I'm trying to create a regex serde in hive to read some log files but am having issue getting it to work...

the log file looks like this...

14.196.202.16:9123  11329   2016-01-27 17:50:26.965 -5                  Thread-14960    CCS 6104    1   Audit.rds.CCS       reportDataService       Failure <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>   <trace>ClientAbortException:  java.net.SocketException: Broken pipe     at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339)  at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392)     at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381)  at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)   at java.io.BufferedOutputStream.write(Unknown Source)   at java.io.BufferedOutputStream.write(Unknown Source)   at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)  at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)   at sun.nio.cs.StreamEncoder.write(Unknown Source)   at java.io.OutputStreamWriter.write(Unknown Source)     at java.io.BufferedWriter.flushBuffer(Unknown Source)   at java.io.BufferedWriter.write(Unknown Source)     at java.io.Writer.write(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source)  at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source)   at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source)  at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source)   at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source)     at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)  at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)  at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe   at java.net.SocketOutputStream.socketWrite0(Native Method)  at java.net.SocketOutputStream.socketWrite(Unknown Source)  at java.net.SocketOutputStream.write(Unknown Source)    at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761)  at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363)  at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785)    at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124)   at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598)     at org.apache.coyote.Response.doWrite(Response.java:533)    at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)     ... 35 more </trace>

I got this far:

([^ ]*)\t(-|[0-9]*)\t

and get this back:

Match 1
1.  14.196.202.16:9123
2.  11329

which gives me the first two back correctly...but when I add the date in like this:

([^ ]*)\t(-|[0-9]*)\t([^ ]*)\t

I get this back:

Match 1
1.  17:50:26.965    -5                    Thread-14960    CCS    6104    1    Audit.rds.CCS        reportDataService
2.   
3.  Failure

I'm very new to regex and am trying to figure this out but am having trouble...I'm also trying to use this site:

http://rubular.com/

essentially I'm trying to get it to look like this:

1. 14.196.202.16:9123   
2. 11329    
3. 2016-01-27 17:50:26.965 -5
4. 
5. 
6. 
7. 
8. Thread-14960 
9. CCS  
10. 6104    
11. 1   
12. Audit.rds.CCS   
13. 
14. reportDataService   
15. 
16. Failure 
17. <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>   
19. <trace>ClientAbortException:  java.net.SocketException: Broken pipe     at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339)  at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392)     at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381)  at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)   at java.io.BufferedOutputStream.write(Unknown Source)   at java.io.BufferedOutputStream.write(Unknown Source)   at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)  at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)   at sun.nio.cs.StreamEncoder.write(Unknown Source)   at java.io.OutputStreamWriter.write(Unknown Source)     at java.io.BufferedWriter.flushBuffer(Unknown Source)   at java.io.BufferedWriter.write(Unknown Source)     at java.io.Writer.write(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source)  at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source)   at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source)  at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source)   at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source)     at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)  at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)  at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe   at java.net.SocketOutputStream.socketWrite0(Native Method)  at java.net.SocketOutputStream.socketWrite(Unknown Source)  at java.net.SocketOutputStream.write(Unknown Source)    at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761)  at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363)  at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785)    at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124)   at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598)     at org.apache.coyote.Response.doWrite(Response.java:533)    at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)     ... 35 more </trace>

EDIT:

so I think I'm on the right track here:

I have this now:

([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t(\w+|\S+|\s+)\t(\w+|.)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t

but I still can't get the <message> and the <trace> to group.

Upvotes: 0

Views: 148

Answers (1)

lightweight
lightweight

Reputation: 3327

I got the regex to work...here is what I ended up going with

([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z_\S]*)\t([0-9]*)\t([0-9]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)

Upvotes: 1

Related Questions