mmesojedec
mmesojedec

Reputation: 11

Convert multiline log to single line

I have multiline log file and I want to convert it to single line log.

Multiline example:

6/13/2015 12:00:47 AM - {   562} START Web 
6/13/2015 12:00:47 AM - Requested Web connection from 123.125.71.103 [123.125.71.103], ID=562 
6/13/2015 12:01:24 AM - {   563} START POP3 
6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=563 
6/13/2015 12:01:24 AM - (   563) USER [email protected] 
6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=563 
6/13/2015 12:01:24 AM - {   563} END POP3
6/13/2015 12:01:24 AM - {   564} START POP3 
6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=564 
6/13/2015 12:01:24 AM - (   564) USER [email protected] 
6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=564 
6/13/2015 12:01:24 AM - {   564} END POP3
6/13/2015 12:01:40 AM - Web connection with 123.125.71.103 [123.125.71.103] ended. ID=562 
6/13/2015 12:01:40 AM - {   562} END Web

For start I would like single line output like this, where I match same log IDs (for example "562").

6/13/2015 12:00:47 AM - {   562} START Web 6/13/2015 12:00:47 AM - Requested Web connection from 123.125.71.103 [123.125.71.103], ID=562 6/13/2015 12:01:40 AM - Web connection with 123.125.71.103 [123.125.71.103] ended. ID=562 6/13/2015 12:01:40 AM - {   562} END Web
6/13/2015 12:01:24 AM - {   563} START POP3 6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=563 6/13/2015 12:01:24 AM - (   563) USER [email protected]  6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=563  6/13/2015 12:01:24 AM - {   563} END POP3
6/13/2015 12:01:24 AM - {   564} START POP3 6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=564 6/13/2015 12:01:24 AM - (   564) USER [email protected]  6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=564  6/13/2015 12:01:24 AM - {   564} END POP3

I have done following bash script which is not working as expected since it's merging all "POP3" or "Web" messages to single line and not sepparating them based on message ID.

Script:

#!/bin/bash

HOME=/var/tmp/test.txt

ID=`((awk '$6 ~/[0-9]\W/ {print $6}' $HOME | awk '{gsub (/)/, ""); print}' | awk '{gsub (/}/, ""); print}') && (awk '$11 ~/[0-9]/ {print $11}' $HOME | awk '{gsub ("ID=", ""); print}'))`


for ID in $HOME
do
        awk '!/Web/' $HOME | xargs >> final.txt
        awk '/Web/' $HOME | xargs >> final.txt
done

Any suggestion how I should create loop to merge only same IDs?

Upvotes: 1

Views: 480

Answers (1)

John B
John B

Reputation: 3646

You could do this with an Awk script:

#!/usr/bin/env awk -f
{
    if($5 ~ /[{(]/) {
        split($6, b, /[)}]/)
        id = b[1]
    } else {
        split($NF, b, "=")
        id = b[2]
    }
    a[id] = a[id] FS $0
}
END 
{
    for(id in a)
        print a[id]
}

Run like:

$ awk -f script.awk logfile
 6/13/2015 12:00:47 AM - {   562} START Web  6/13/2015 12:00:47 AM - Requested Web connection from 123.125.71.103 [123.125.71.103], ID=562  6/13/2015 12:01:40 AM - Web connection with 123.125.71.103 [123.125.71.103] ended. ID=562  6/13/2015 12:01:40 AM - {   562} END Web
 6/13/2015 12:01:24 AM - {   563} START POP3  6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=563  6/13/2015 12:01:24 AM - (   563) USER [email protected]  6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=563  6/13/2015 12:01:24 AM - {   563} END POP3
 6/13/2015 12:01:24 AM - {   564} START POP3  6/13/2015 12:01:24 AM - Requested POP3 connection from 10.127.251.37 [10.127.251.37], ID=564  6/13/2015 12:01:24 AM - (   564) USER [email protected]  6/13/2015 12:01:24 AM - POP3 connection with 10.127.251.37 [10.127.251.37] ended. ID=564  6/13/2015 12:01:24 AM - {   564} END POP3

The script checks the 5th field for { or ( characters and splits the 6th or last fields accordingly to obtain the correct id. Then, the id is used as a key in the array a to append the line ($0) to its corresponding value. All elements of the array are then printed after every line has been processed.

Upvotes: 1

Related Questions