Reputation: 3327
I'm trying to create a regex serde in hive to read some log files but am having issue getting it to work...
the log file looks like this...
14.196.202.16:9123 11329 2016-01-27 17:50:26.965 -5 Thread-14960 CCS 6104 1 Audit.rds.CCS reportDataService Failure <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message> <trace>ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89) at java.io.BufferedOutputStream.write(Unknown Source) at java.io.BufferedOutputStream.write(Unknown Source) at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.BufferedWriter.write(Unknown Source) at java.io.Writer.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source) at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source) at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source) at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598) at org.apache.coyote.Response.doWrite(Response.java:533) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ... 35 more </trace>
I got this far:
([^ ]*)\t(-|[0-9]*)\t
and get this back:
Match 1
1. 14.196.202.16:9123
2. 11329
which gives me the first two back correctly...but when I add the date in like this:
([^ ]*)\t(-|[0-9]*)\t([^ ]*)\t
I get this back:
Match 1
1. 17:50:26.965 -5 Thread-14960 CCS 6104 1 Audit.rds.CCS reportDataService
2.
3. Failure
I'm very new to regex and am trying to figure this out but am having trouble...I'm also trying to use this site:
essentially I'm trying to get it to look like this:
1. 14.196.202.16:9123
2. 11329
3. 2016-01-27 17:50:26.965 -5
4.
5.
6.
7.
8. Thread-14960
9. CCS
10. 6104
11. 1
12. Audit.rds.CCS
13.
14. reportDataService
15.
16. Failure
17. <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>
19. <trace>ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89) at java.io.BufferedOutputStream.write(Unknown Source) at java.io.BufferedOutputStream.write(Unknown Source) at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.BufferedWriter.write(Unknown Source) at java.io.Writer.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source) at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source) at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source) at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598) at org.apache.coyote.Response.doWrite(Response.java:533) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ... 35 more </trace>
EDIT:
so I think I'm on the right track here:
I have this now:
([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t(\w+|\S+|\s+)\t(\w+|.)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t
but I still can't get the <message>
and the <trace>
to group.
Upvotes: 0
Views: 148
Reputation: 3327
I got the regex to work...here is what I ended up going with
([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z_\S]*)\t([0-9]*)\t([0-9]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)
Upvotes: 1