Reputation: 157
I have a question about Amazon web services elastic map reduce. Is it possible to use EMR to process the logs from cloudtrail? Briefly describe how it can be done?
Upvotes: 0
Views: 891
Reputation: 2343
Yes. AWS Cloudtrail logs can be parsed with EMR Hive or EMR Spark.
On EMR Spark : AWSLabs has an open source code to convert your AWS CloudTrail logs to a Spark Data Frame which you can then query with SQL https://github.com/awslabs/timely-security-analytics/blob/master/src/main/scala/CloudTrailToSQL.scala
On EMR Hive : EMR clusters include a Cloudtrail SerDe designed to parse Cloudtrial logs. These classes are part of /usr/share/aws/emr/goodies/lib/ EmrHadoopGoodies-x.jar & /usr/share/aws/emr/goodies/lib/ EmrHiveGoodies-x.jar and are automatically included in Hive classpath. Hive can also automatically de-compress the GZ files. All you need to do is to run a query similar to SQL commands. Data is processed by CloudTrailInputFormat implementation, which defines the input data split and key/value records. The CloudTrailLogDeserializer class defined in SerDe is called to format the data into a record that maps to column and data types in a table. Data (such as using an INSERT statement) to be written is translated by the Serializer class defined in SerDe to the format that the OUTPUTFORMAT class( HiveIgnoreKeyTextOutputFormat) can read.
Upvotes: 1