shantanuo
shantanuo

Reputation: 32316

split string to be inserted into database

I have a text file with a lot of such lines.

Jul 15 12:12:51 whitelist logger: 1|999999999999|id:d9faff7c-4016-4343-b494-37028763bb66 submit date:1307130919 done date:1307130919 stat:DELIVRD err:0|L_VB3_NM_K_P|1373687445|vivnel2|L_VB3_GH_K_P|promo_camp1-bd153424349bc647|1

I need to insert the values in the database and hence I need to separate the values.

1) logger
2) submit date
3) done date
4) stat
5) err

The following is working to isolate the logger string.

tail  messages | grep logger: | awk -F'logger: ' '{print $2}' | awk '{print $1}'

Is it the right way to divide a string? Any better option available?

Upvotes: 0

Views: 113

Answers (2)

captcha
captcha

Reputation: 3756

If you put the key words in a file, this will work: code for GNU :

 sed -nr 's#.*#h;s/.*(&):\\s*(\\w+).*/\\1:\\2/p;g#p' file2|sed -nrf - file1

Example:

$ cat file1
Jul 15 12:12:51 whitelist logger: 1|999999999999|id:d9faff7c-4016-4343-b494-37028763bb66 submit date:1307130919 done date:1307130919 stat:DELIVRD err:0|L_VB3_NM_K_P|1373687445|vivnel2|L_VB3_GH_K_P|promo_camp1-bd153424349bc647|1

$ cat file2
logger
submit date
done date
stat
err

$ sed -nr 's#.*#h;s/.*(&):\\s*(\\w+).*/\\1:\\2/p;g#p' file2|sed -nrf - file1
logger:1
submit date:1307130919
done date:1307130919
stat:DELIVRD
err:0

Upvotes: 3

calmrat
calmrat

Reputation: 7962

There are many ways to accomplish this in Python. One simple approach is to use Python's built in regular expressions. Assuming the log output always follows the rules mentioned, you could extract the parts of interest like this:

import re

s = "Jul 15 12:12:51 whitelist logger: 1|999999999999|id:d9faff7c-4016-4343-b494-37028763bb66 submit date:1307130919 done date:1307130919 stat:DELIVRD err:0|L_VB3_NM_K_P|1373687445|vivnel2|L_VB3_GH_K_P|promo_camp1-bd153424349bc647|1"

logger_re = re.compile(
"logger: ([^ ]+)\
 submit date:(\d+)\
 done date:(\d+)\
 stat:(.+)\
 err:(.+)$")

print logger_re.search(s).groups()

The .groups() method returns back a tuple of the strings found within the () parenthesis.

See http://docs.python.org/2/library/re.html

Upvotes: 1

Related Questions