user3248750
user3248750

Reputation: 11

Td-agent rename and import older logs

A few days ago our main Fluentd aggregator instance was wrongfully re-created. Because of it, our td-agent buffer instances lost connection to it for ~12 hours and accumulated a lot of logs. After fixing the connectivity and some td-agent restarts, we were able to resume correct functionality after like 2 days.

The problem is, a bunch of logs from those 2 days weren't sent by the td-agent instance to the aggregator instance once it resumed working. That causes a big loss of our tracking data in these 2 days. Here's our td-agent.conf:

<source>
  # legacy trackevent schema: missing field1-field7
  type tail
  log_level error
  path /var/log/php-fpm/track_access_hlog
  pos_file /var/log/php-fpm/legacy-trackevent.pos
  time_format %d/%b/%Y:%H:%M:%S
  format /^\[(?<dateday>[^ ]+) .+\] (?:\S+) \/(?<event>(?!(.*PING)|(PUSH_ALL)|(PUSH_ARRIVAL)|(PUSH_MUTE)|(PUSH_NOT)|(PUSH_RECEIVED)|(PUSH_SENT)|(VIDEO.*))[^\/]+)\/(?:[^\/]*)\/(?<userid>[^\/]*)\/(?<session>[^\/]*)\/(?<broadcastid>[^\/]*)\/(?<doorid>[^\/]*)\/(?<userlevel>[^\/]*)\/(?<broadcastscount>[^\/]*)\/(?<unspentcoins>[^\/]*)\/(?:[^\/]*)\/(?<extradata>[^\/]*)\/(?<coins>[^\/]*)\/(?<points>[^\/]*)\/(?<platform>[^\/]*)\/(?<sourceid>[^\/]*)\/(?<domain>[^\/]*)\/trpxl\.gif.*$/
  time_key dateday
  tag tracklive.ip
</source>
<source>
  # "current" trackevent schema: requires field1-field7
  type tail
  log_level error
  path /var/log/php-fpm/track_access_hlog
  pos_file /var/log/php-fpm/track-access.log.pos # This is where you record file position
  time_format %d/%b/%Y:%H:%M:%S
  format /^\[(?<dateday>[^ ]+) .+\] (?:\S+) \/(?<event>(?!(.*PING)|(PUSH_ALL)|(PUSH_ARRIVAL)|(PUSH_MUTE)|(PUSH_NOT)|(PUSH_RECEIVED)|(PUSH_SENT)|(VIDEO.*))[^\/]+)\/(?:[^\/]*)\/(?<userid>[^\/]*)\/(?<session>[^\/]*)\/(?<broadcastid>[^\/]*)\/(?<doorid>[^\/]*)\/(?<userlevel>[^\/]*)\/(?<broadcastscount>[^\/]*)\/(?<unspentcoins>[^\/]*)\/(?:[^\/]*)\/(?<extradata>[^\/]*)\/(?<coins>[^\/]*)\/(?<points>[^\/]*)\/(?<platform>[^\/]*)\/(?<sourceid>[^\/]*)\/(?<domain>[^\/]*)\/(?<field1>[^\/]*)\/(?<field2>[^\/]*)\/(?<field3>[^\/]*)\/(?<field4>[^\/]*)\/(?<field5>[^\/]*)\/(?<field6>[^\/]*)\/(?<field7>[^\/]*)\/trpxl\.gif.*$/
  time_key dateday
  tag tracklive.ip
</source>
<match tracklive.**>
  @type forward
  @id forward_output
  phi_failure_detector false
  send_timeout 10s
  expire_dns_cache 60s
  buffer_queue_limit  256
  buffer_chunk_limit  16m
  buffer_type  file
  buffer_path  /var/log/td-agent/buffer/
  <server>
    name fluentd-aggregator-box
    host fluentd.ourdomainxxxx.com
  </server>
</match>
<source>
  type tail
  log_level   error
  format      /^\[(?<dateday>[^ ]+) .+\][^\/]+\/(?<event>PUSH[^\/]*)\/[^\/]*\/(?<userid>[^\/]*)\/(?<session>[^\/]*)\/(?<broadcastid>[^\/]*)\/(?<doorid>[^\/]*)\/(?<userlevel>[^\/]*)\/[^\/]*\/(?<unspentcoins>[^\/]*)\/[^\/]*\/(?<push_origin>[^\/]*)\/[^\/]*\/[^\/]*\/(?<platform>[^\/]*)\/(?<sourceid>[^\/]*)\/(?<deviceid>[^\/]*)\/(?<push_type>[^\/]*)\/(?<channel_id>[^\/]*)\/(?<reserved128>[^\/]*)\/(?<reserved256>[^\/]*)\/(?<mobile_app_version>[^\/]*)\/(?<language>[^\/]*)\/(?<legacy_app_version>[^\/]*)\/trpxl.gif.*/
  path        /var/log/php-fpm/track_access_hlog
  pos_file    /var/log/php-fpm/push-event.pos
  time_format %d/%b/%Y:%H:%M:%S  # nginx default
  tag         push_events.ip
</source>
<match push_events.**>
  type forward
  phi_failure_detector false
  send_timeout 36000s
  expire_dns_cache 60s
  buffer_queue_limit  1800
  buffer_chunk_limit  20M
  buffer_type  file
  buffer_path  /var/log/td-agent/buffer_push_events/
  <server>
    host fluentd.ourdomainxxxx.com
  </server>
</match>

Currently its successfully reading from source, creating buffer .log files in the buffer_ping_events folder (i.e.) and sending them to our aggregator box. But the older files, from May 9th to May 11th, are in the same folder but are not picked up and sent to the aggregator.

Considering that the logs follow a sequential sequence in hexadecimal, if I rename them with a higher number will they be picked up and sent by the td-agent? If so, is it possible that that will cause future problems?

Thanks in advance

Changed some configuration, restarted td-agent, expected all logs files to being sent, but only newly generated ones are working. Also tried forcing buffed files with the command:

kill -USR1 'cat /var/run/td-agent/td-agent.pid'

To no avail

Upvotes: 1

Views: 123

Answers (0)

Related Questions