ljaerj
ljaerj

Reputation: 157

Use EMR to process cloudtrail logs

I have a question about Amazon web services elastic map reduce. Is it possible to use EMR to process the logs from cloudtrail? Briefly describe how it can be done?

Upvotes: 0

Views: 891

Answers (1)

jc mannem
jc mannem

Reputation: 2343

Yes. AWS Cloudtrail logs can be parsed with EMR Hive or EMR Spark.

On EMR Spark : AWSLabs has an open source code to convert your AWS CloudTrail logs to a Spark Data Frame which you can then query with SQL https://github.com/awslabs/timely-security-analytics/blob/master/src/main/scala/CloudTrailToSQL.scala

On EMR Hive : EMR clusters include a Cloudtrail SerDe designed to parse Cloudtrial logs. These classes are part of /usr/share/aws/emr/goodies/lib/ EmrHadoopGoodies-x.jar & /usr/share/aws/emr/goodies/lib/ EmrHiveGoodies-x.jar and are automatically included in Hive classpath. Hive can also automatically de-compress the GZ files. All you need to do is to run a query similar to SQL commands. Data is processed by CloudTrailInputFormat implementation, which defines the input data split and key/value records. The CloudTrailLogDeserializer class defined in SerDe is called to format the data into a record that maps to column and data types in a table. Data (such as using an INSERT statement) to be written is translated by the Serializer class defined in SerDe to the format that the OUTPUTFORMAT class( HiveIgnoreKeyTextOutputFormat) can read.

Upvotes: 1

Related Questions