Madivad
Madivad

Reputation: 3337

parse file and date manipulation in bash with a large file

I'm looking for a better way to manipulate a date format into something that I want. I do manage to do it, but I have to process the files several times because I can not get date to do it in one pass.

The format I have:
Wed Jan 30 08:00:00 2019 : misc data

The format I want:
30/01/2019 08:00:00 : misc data

However, I am only able to get date to process the date info if it is in the format:
30-Jan-2019 08:00:00 : misc data

(note: misc data is a long string containing many unwieldy characters)

To achieve what I want I am using:

awk '{("date --date="$3"-"$2"-"$5"\\ "$4" +%F") | getline $1;$2="";$3="";$4;$5=""} 1' oldfile | tr -s ' ' > newfile

What this does is creates a format I can use, parses that into fields $1, clears fields 2, 3, and 5, prints it out (keeping the time in field 4, and misc data) and strips out the extra spaces left by the blank fields and saves it to a new file. I then I have to manipulate the format including the separators (because date doesn't like / if using a named month) into a new format and the whole process is becoming too complicated.

I then run another awk over it swapping fields and separators around.

I'm sure this can be streamlined but it's starting too confuse me now.

I do realise I should be using the output format of date, but because there are slashes involved, as soon as I include single or double quotes, or try to escape them, I find that anything involving multiple format elements fails.

To make it worse, this all works when I work on a limited set of data (usually a sample limited by head or tail, but the original file is some 20,000 entries long and it fails at FNR=1043 with too many open files. It is only the one file open and one file saved. I think this is as a result of using getline. Is there a way to do this without using it??

Upvotes: 1

Views: 156

Answers (2)

stack0114106
stack0114106

Reputation: 8711

Another awk

$ echo 'Wed Jan 30 08:00:00 2019 : misc data' | awk -F: -v OFS=: ' { t=$NF;NF--; 
    cmd="date -d\047" $0 "\047 \047+%d/%m/%Y %T\047"; if ( (cmd | getline line) > 0 ) 
    close(cmd); print line,t}'
30/01/2019 08:00:00: misc data
$

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203413

You don't need to call date just to shuffle text around:

$ echo 'Wed Jan 30 08:00:00 2019 : misc data' |
awk '{
    mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",$2)+2)/3
    date = sprintf("%02d/%02d/%04d %s", $3, mthNr, $5, $4)
    sub(/^([^ ]+ +){5}/,"")
    print date, $0
}'
30/01/2019 08:00:00 : misc data

The too many open files error you got btw is because you aren't closing the pipe after every invocation of getline. See http://awk.freeshell.org/AllAboutGetline for when and how to use getline robustly.

Upvotes: 3

Related Questions