unchained
unchained

Reputation: 29

How can I convert date in a column to a epoch timestamp in bash?

I have a file containing this:

2014-11-10 02:00:03,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
2014-11-10 02:00:07,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
2014-11-10 02:00:11,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

I want to convert the timestamp to a date in this format:

1415602803,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
1415602807,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
1415602811,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

How can I achieve this using awk or any other bash tool.

Background: I need to grab files from hdfs for a particular time range. Initially I was using hdfs dfs -stat to bring epoch time and then programming a filter

cat <fileWithFilePaths> | xargs -I ^ -P 15 sh -c 'printf "%s\n" "$(hdfs dfs -stat "%n,%Y,%b" ^)"'>output1

awk 'BEGIN{beginDE=ENVIRON["beginDataEpoch"]; endDE=ENVIRON["endDataEpoch"]; FS="@"} {if ($1 >= beginDE && $1 <= endDE) print $0}' output1

But later realized its using hdfs stat is expensive function. So now I am using hdfs dfs -ls to bring the file time but its in yyyy-mm-dd hh:mm format, thus I am seeking help in regards to change this in epoch time for calculations down the line.

Upvotes: 0

Views: 1910

Answers (4)

Simon Nebesky
Simon Nebesky

Reputation: 51

I had the same issue, when I needed to convert the date to unix time in some historical stock data. Used this sed and date command. Please note, that this only works on Linux (GNU implementation of sed and date).

simon@debian:~/Downloads$ cat inputFile
2014-11-10 02:00:03,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
2014-11-10 02:00:07,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
2014-11-10 02:00:11,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22
simon@debian:~/Downloads$
simon@debian:~/Downloads$
simon@debian:~/Downloads$
simon@debian:~/Downloads$ sed 's/20[0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9]/echo `date --date="&" +"%s"`/e' inputFile > output
simon@debian:~/Downloads$
simon@debian:~/Downloads$
simon@debian:~/Downloads$
simon@debian:~/Downloads$ cat output 
1415602803,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
1415602807,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
1415602811,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

Upvotes: 0

David C. Rankin
David C. Rankin

Reputation: 84551

You can use awk along with its builtin gensub, mktime and sub to accomplish the same thing, e.g.

$ awk -F, '{ str=$0; sub($1, mktime(gensub(/[-:]/, " ", "g", $1)), str); print str }' file
1415606403,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
1415606407,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
1415606411,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

Explanation:

  • gensub(/[-:]/, " ", "g", $1) - replace all '-' and ':' in the first field with spaces;
  • mktime(...) - turn the resulting date into a timestamp;
  • str=$0; sub($1, ..., str); - replace the first field with the timestamp, result in str; and finally
  • print str - print it.

(note: Your timezone and my are 3-hours different. Your output should give you your desired results on your system. You can also adjust mktime for Daylight Savings (if needed) and you can use the builtin strftime with the utc-flag to adjust for UTC.)

Upvotes: 1

xhienne
xhienne

Reputation: 6134

If your log file is large, this GNU awk command will be much faster than a bash loop calling the external command date for each line:

gawk -vFS=, -vOFS=, '{ $1 = mktime(gensub("[-:]", " ", "g", $1)) } 1'
  • The first comma-delimited field is replaced with the result of mktime, a function (a GNU extension) that returns its parameter as a number of seconds since EPOCH.
  • gensub (a GNU extension) is used to turn "YYYY-MM-DD hh:mm:ss" into "YYYY MM DD hh mm ss" which is the format expected by mktime().
  • The final 1 is a mere TRUE condition that outputs the line.

Upvotes: 2

KamilCuk
KamilCuk

Reputation: 140970

I guess while read loop would be the simplest. The following:

while IFS=, read -r date rest; do
        printf "%s,%s\n" "$(date --date="$date" +%s)" "$rest"
done <<EOF
2014-11-10 02:00:03,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
2014-11-10 02:00:07,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
2014-11-10 02:00:11,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22
EOF

Will output:

1415581203,LOGIN SUCCESS,AUTH,user2,192.168.203.63,10.146.124.73,59996,22
1415581207,LOGIN SUCCESS,AUTH,user1,172.24.31.10,172.32.1.1,48191,22
1415581211,LOGIN FAILED,AUTH,root,172.24.166.153,10.146.124.73,52506,22

Upvotes: 0

Related Questions