shandy
shandy

Reputation: 31

Log Parsing using Scala

How can we parse the kind of logs below by using Scala?
I want to read this kind of data and put that into a Hive table.

log timestamp=“2018-04-06T22:43:19.565Z” eventCategory=“Application” eventType=“Error”

log contents are actually in HTML tag of < />

Upvotes: 0

Views: 673

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191874

Why can't you just load the data logs in Hive as-is, though? Use a RegexSerde in Hive

Make a directory

hdfs dfs -mkdir -p /some/hdfs/path

Make a table

DROP TABLE IF EXISTS logdata;

CREATE EXTERNAL TABLE logdata (
  timestamp STRING,
  eventCategory STRING,
  eventType STRING,
  )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "log timestamp=\“([^ ]*)\” eventCategory=\“([^ ]*)\” eventType=\“([^ ]*)\”",
  "output.format.string" = "%1$s %2$s %3$s"
)
STORED AS TEXTFILE
LOCATION '/some/hdfs/path/';

Upload your logs

hdfs dfs -copyFromLocal data.log /some/hdfs/path/

Query the table

SELECT * FROM logdata;

Upvotes: 1

Related Questions