Reputation: 3172

how can I split up this string

I am currently trying to sanitize some log files so they are in an easier format to read, and have been trying to use the gnu cut command, which works fairly well, although I cannot really think of a good way to remove the [INFO] part of the string

logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> Â§b<BoonTheMoon>Â§ohhhhhh
logs/logs/server_1282136782.log:2010-08-18 16:27:32 [INFO] <pinguin> <pinguin>Â§F :/
logs/logs/server_1282136782.log:2010-08-18 16:27:37 [INFO] <TotempaaltJ> <TotempaaltJ>Â§F That helped A LOT
logs/logs/server_1282136782.log:2010-08-18 16:27:37 [INFO] <Rizual> Â§b<Rizual>Â§F hm?
logs/logs/server_1282136782.log:2010-08-18 16:29:10 [INFO] <pinguin> <pinguin>Â§F bah
logs/logs/server_1282136782.log:2010-08-18 16:29:35 [INFO] <TotempaaltJ> <TotempaaltJ>Â§F Finished my houses 
logs/logs/server_1282136782.log:2010-08-18 16:29:40 [INFO] <TotempaaltJ> <TotempaaltJ>Â§F or whatever
logs/logs/server_1282136782.log:2010-08-18 16:30:47 [INFO] <Rizual> Â§b<Rizual>Â§So much iron
logs/logs/server_1282136782.log:2010-08-18 16:30:58 [INFO] <TotempaaltJ> <TotempaaltJ>Â§F Ah yes, furnaces don't work.o
logs/logs/server_1282136782.log:2010-08-18 16:31:01 [INFO] <Rizual> Â§b<Rizual>Â§F They do
logs/logs/server_1282136782.log:2010-08-18 16:31:06 [INFO] <TotempaaltJ> <TotempaaltJ>Â§F Hm
logs/logs/server_1282136782.log:2010-08-18 16:31:08 [INFO] <Rizual> Â§b<Rizual>Â§F just need to use /lighter
logs/logs/server_1282136782.log:2010-08-18 16:31:12 [INFO] <Valrix> <Valrix>Â§FNotch fixed them?

I would ultimately want to get the strings down to something that resembles the following (keep in mind that the logs are in two formats, the older format which has 2 copies of the names, as can be seen in the bulk of the logs above, and also the newer format, which only has the name in there once (can be seen in the first log line, the <natemar> one))

2010-08-31 23:06:51 <NateMar> where?!    
2010-08-15 22:59:53 <BoonTheMoon> ohhhhhh (this one would require both the same editing as above, plus removal of the "extra" name Â§b<BoonTheMoon>Â§)

How should I go about doing this? Have thought about using awk, although I'm having a difficult time getting a grip on how that would work, so not sure how to set up something to do that. Any help would be greatly appreciated, thanks!

Upvotes: 2

Answers (4)

Dave M

Reputation: 1322

You're on the right track using the cut command. The key to removing the [INFO] field is to exclude it from the final output. The -f1,2,4- argument does just that by including all fields except the 3rd which is just [INFO] at that point.

cut -d: -f2- Input.txt | cut -d' ' -f1,2,4- > Output.txt

Upvotes: 3

ghoti

Reputation: 46896

More takes on this, in sed, awk and bash:

[ghoti@pc ~]$ cat text
logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> Â§b<BoonTheMoon>Â§ohhhhhh

[ghoti@pc ~]$ sed 's/^[^:]*://;s/[[][^]]*[]] //' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> Â§b<BoonTheMoon>Â§ohhhhhh

[ghoti@pc ~]$ awk '{sub(/^[^:]+:/,""); $3=""} 1' text
2010-08-31 23:06:51  <NateMar> where?!
2010-08-15 22:59:53  <BoonTheMoon> Â§b<BoonTheMoon>Â§ohhhhhh

[ghoti@pc ~]$ while read line; do line=${line#*:}; echo "${line/\[*\] }"; done < text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> Â§b<BoonTheMoon>Â§ohhhhhh

While these are simple, they may be imperfect for the sake of shortness. For example, the awk script, by eliminating the third "word", leaves spaces that delimit the now-null word.

Note that as "elegant" as one-liners may seem for quick jobs, it's usually a better idea to be explicit with your code, especially when you have to deal with unknown input data or if you won't be inspecting your results immediately after you run things.

This is harder to read, but could be much safer, depending on your input:

[ghoti@pc ~]$ awk '$3~/^[[].+[]]$/{$3="";sub(/  /," ")} {sub(/^[^:]+:/,"")} 1' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> ÃÂ§b<BoonTheMoon>ÃÂ§ohhhhhh

For the bash script, you'd be safer to use a character class rather than a glob:

[ghoti@pc ~]$ shopt -s extglob
[ghoti@pc ~]$ while read line; do line=${line#*:}; echo "${line/\[+([[:upper:]])\] /}"; done < text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> ÃÂ§b<BoonTheMoon>ÃÂ§ohhhhhh

Note that the extglob shopt option lets you use more advanced pattern matching inside the parameter replacement pattern. man bash and look for Pathname Expansion for details.

UPDATE:

You've added a new requirement to your question that wasn't there originally. Here's how you can achieve your new requirement with awk:

awk '$3~/^[[].+[]]$/{$3="";sub(/  /," ")} {sub(/^[^:]+:/,"")} $3~/^<.+>$/{sub(/^(Â§b)?<[[:alpha:]]+>Â§/,"",$4)} 1' text

This simply removes coloured nicknames from the 4th string, if the 3rd string looks like a bracketed nickname. This works for the sample you posted, but only you can determine whether this will work for you.

And with bash:

shopt -s extglob
while read date time tag nick line; do
  printf "%s %s %s %s\n" "${date#*:}" "$time" "$nick" "${line/#*([^< ])$nick??}"
done < text

Upvotes: 3

ДМИТРИЙ МАЛИКОВ

Reputation: 22000

With sed it could be done more demostrably:

$> cat ./text
logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> Â§b<BoonTheMoon>Â§ohhhhhh

$> sed -r -e 's/^.*log:([0-9]{4}-[0-9]{2}-[0-9]{2}\ )([0-9\ \:]*\ )(\[[A-Z]*\]\ )(.*)$/\1\2\4/' ./text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> Â§b<BoonTheMoon>Â§ohhhhhh

Whole idea is to match some fields of log string and then leave only ones you need.

Upvotes: 1

Levon

Reputation: 143162

(potentially pending revision based on answer to the question posted in the comment above)

Using awk:

awk '{sub(".log:", ".log "); print $2, $3, $5, $6}' data.txt

will give you:

2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> Â§b<BoonTheMoon>Â§ohhhhhh

Explanation:

I changed the : after ".log:" to a blank and then was able to separate the fields in the line by white-space. The fields you were interested in were 2, 3, 5 and 6 so I printed them out with awk by using $ to get the content of each field on the line.

Note that you can also use printf to format the data more precisely if that's needed.

Upvotes: 2

how can I split up this string

Answers (4)

Related Questions