Reputation: 32316
I have a text file with a lot of such lines.
Jul 15 12:12:51 whitelist logger: 1|999999999999|id:d9faff7c-4016-4343-b494-37028763bb66 submit date:1307130919 done date:1307130919 stat:DELIVRD err:0|L_VB3_NM_K_P|1373687445|vivnel2|L_VB3_GH_K_P|promo_camp1-bd153424349bc647|1
I need to insert the values in the database and hence I need to separate the values.
1) logger
2) submit date
3) done date
4) stat
5) err
The following is working to isolate the logger string.
tail messages | grep logger: | awk -F'logger: ' '{print $2}' | awk '{print $1}'
Is it the right way to divide a string? Any better option available?
Upvotes: 0
Views: 113
Reputation: 3756
If you put the key words in a file, this will work: code for GNU sed:
sed -nr 's#.*#h;s/.*(&):\\s*(\\w+).*/\\1:\\2/p;g#p' file2|sed -nrf - file1
Example:
$ cat file1 Jul 15 12:12:51 whitelist logger: 1|999999999999|id:d9faff7c-4016-4343-b494-37028763bb66 submit date:1307130919 done date:1307130919 stat:DELIVRD err:0|L_VB3_NM_K_P|1373687445|vivnel2|L_VB3_GH_K_P|promo_camp1-bd153424349bc647|1 $ cat file2 logger submit date done date stat err $ sed -nr 's#.*#h;s/.*(&):\\s*(\\w+).*/\\1:\\2/p;g#p' file2|sed -nrf - file1 logger:1 submit date:1307130919 done date:1307130919 stat:DELIVRD err:0
Upvotes: 3
Reputation: 7962
There are many ways to accomplish this in Python. One simple approach is to use Python's built in regular expressions. Assuming the log output always follows the rules mentioned, you could extract the parts of interest like this:
import re
s = "Jul 15 12:12:51 whitelist logger: 1|999999999999|id:d9faff7c-4016-4343-b494-37028763bb66 submit date:1307130919 done date:1307130919 stat:DELIVRD err:0|L_VB3_NM_K_P|1373687445|vivnel2|L_VB3_GH_K_P|promo_camp1-bd153424349bc647|1"
logger_re = re.compile(
"logger: ([^ ]+)\
submit date:(\d+)\
done date:(\d+)\
stat:(.+)\
err:(.+)$")
print logger_re.search(s).groups()
The .groups() method returns back a tuple of the strings found within the ()
parenthesis.
See http://docs.python.org/2/library/re.html
Upvotes: 1