Reputation: 31
How can we parse the kind of logs below by using Scala?
I want to read this kind of data and put that into a Hive table.
log timestamp=“2018-04-06T22:43:19.565Z” eventCategory=“Application” eventType=“Error”
log contents are actually in HTML tag of < />
Upvotes: 0
Views: 673
Reputation: 191874
Why can't you just load the data logs in Hive as-is, though? Use a RegexSerde in Hive
Make a directory
hdfs dfs -mkdir -p /some/hdfs/path
Make a table
DROP TABLE IF EXISTS logdata;
CREATE EXTERNAL TABLE logdata (
timestamp STRING,
eventCategory STRING,
eventType STRING,
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "log timestamp=\“([^ ]*)\” eventCategory=\“([^ ]*)\” eventType=\“([^ ]*)\”",
"output.format.string" = "%1$s %2$s %3$s"
)
STORED AS TEXTFILE
LOCATION '/some/hdfs/path/';
Upload your logs
hdfs dfs -copyFromLocal data.log /some/hdfs/path/
Query the table
SELECT * FROM logdata;
Upvotes: 1