How to copy EMR streaming job logs to S3 and clean logs on EMR core node's disk

Question

Good day,

I am running a Flink (v1.7.1) streaming job on AWS EMR 5.20, and I would like to have all task_managers and job_manager's logs of my job in S3. Logback is used as recommended by the Flink team. As it is a long-running job, I want the logs to be:

Copied to S3 periodically
Rolling either on time or size or both (as there might be a huge amount of logs)
Get cleaned from the internal disk of the EMR nodes (otherwise the disks will become full)

What I have tried are:

Enabled logging to S3 when creating the EMR cluster
Configured yarn rolling logs with: yarn.log-aggregation-enable, yarn.nodemanager.remote-app-log-dir, yarn.log-aggregation.retain-seconds, yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
Configured rolling logs in logback.xml:

    
        ${log.file}
        
            %d{yyyy-MM-dd HH}.%i.log
            30MB    
            3
            50MB
        
        
            %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n

What I got/observed until now are:

(1) did help with periodically copying the logs file to S3
(2) seemed useless for me until now. Logs are only aggregated when the streaming job ended, and no rolling was observed.
(3) yielded some result, but not close to requirements yet:
- the rolling logs are there in some cache folder (/mnt/yarn/usercache/hadoop/appcache/application_1549236419773_0002/container_1549236419773_0002_01_000002)
- only the last rolling logs file is available in the usual YARN logs folder (/mnt/var/log/hadoop-yarn/containers/application_1549236419773_0002/container_1549236419773_0002_01_000002)
- only the last rolling logs file is available in S3

In short, out of the 3 requirements I got, I could only either (1) or (2&3).

Could you please help me with this?

Thanks and best regards,

Averell

How to copy EMR streaming job logs to S3 and clean logs on EMR core node's disk

Answers (1)

Related Questions

How to copy EMR streaming job logs to S3 and clean logs on EMR core node&#39;s disk

Answers (1)

Related Questions

How to copy EMR streaming job logs to S3 and clean logs on EMR core node's disk