Reputation: 843
Good day,
I am running a Flink (v1.7.1) streaming job on AWS EMR 5.20, and I would like to have all task_managers and job_manager's logs of my job in S3. Logback is used as recommended by the Flink team. As it is a long-running job, I want the logs to be:
What I have tried are:
<appender name="ROLLING" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>${log.file}</file> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>%d{yyyy-MM-dd HH}.%i.log</fileNamePattern> <maxFileSize>30MB</maxFileSize> <maxHistory>3</maxHistory> <totalSizeCap>50MB</totalSizeCap> </rollingPolicy> <encoder> <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n</pattern> </encoder> </appender>
What I got/observed until now are:
In short, out of the 3 requirements I got, I could only either (1) or (2&3).
Could you please help me with this?
Thanks and best regards,
Averell
Upvotes: 1
Views: 1857
Reputation: 9245
From what I know, the auto-backup of logs to S3 that EMR supports will only work at the end of the job, since it's based on the background log-loader that was originally implemented by AWS for batch jobs. Maybe there's a way to get it to work for rolling logs, I just have never heard about it.
I haven't tried this myself, but if I had to then I'd probably try the following:
S3fs
. logrotate
(or equivalent) to automatically copy and clean up the log files.You can use a bootstrap action to automatically set up all of the above.
If S3fs
gives you problems, then you can do a bit more scripting and directly use the aws s3
command to sync logs, and then remove them once they've been copied.
Upvotes: 0