Reputation: 3162
I am currently trying to sanitize some log files so they are in an easier format to read, and have been trying to use the gnu cut command, which works fairly well, although I cannot really think of a good way to remove the [INFO] part of the string
logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
logs/logs/server_1282136782.log:2010-08-18 16:27:32 [INFO] <pinguin> <pinguin>§F :/
logs/logs/server_1282136782.log:2010-08-18 16:27:37 [INFO] <TotempaaltJ> <TotempaaltJ>§F That helped A LOT
logs/logs/server_1282136782.log:2010-08-18 16:27:37 [INFO] <Rizual> §b<Rizual>§F hm?
logs/logs/server_1282136782.log:2010-08-18 16:29:10 [INFO] <pinguin> <pinguin>§F bah
logs/logs/server_1282136782.log:2010-08-18 16:29:35 [INFO] <TotempaaltJ> <TotempaaltJ>§F Finished my houses
logs/logs/server_1282136782.log:2010-08-18 16:29:40 [INFO] <TotempaaltJ> <TotempaaltJ>§F or whatever
logs/logs/server_1282136782.log:2010-08-18 16:30:47 [INFO] <Rizual> §b<Rizual>§So much iron
logs/logs/server_1282136782.log:2010-08-18 16:30:58 [INFO] <TotempaaltJ> <TotempaaltJ>§F Ah yes, furnaces don't work.o
logs/logs/server_1282136782.log:2010-08-18 16:31:01 [INFO] <Rizual> §b<Rizual>§F They do
logs/logs/server_1282136782.log:2010-08-18 16:31:06 [INFO] <TotempaaltJ> <TotempaaltJ>§F Hm
logs/logs/server_1282136782.log:2010-08-18 16:31:08 [INFO] <Rizual> §b<Rizual>§F just need to use /lighter
logs/logs/server_1282136782.log:2010-08-18 16:31:12 [INFO] <Valrix> <Valrix>§FNotch fixed them?
I would ultimately want to get the strings down to something that resembles the following (keep in mind that the logs are in two formats, the older format which has 2 copies of the names, as can be seen in the bulk of the logs above, and also the newer format, which only has the name in there once (can be seen in the first log line, the <natemar>
one))
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> ohhhhhh (this one would require both the same editing as above, plus removal of the "extra" name §b<BoonTheMoon>§)
How should I go about doing this? Have thought about using awk, although I'm having a difficult time getting a grip on how that would work, so not sure how to set up something to do that. Any help would be greatly appreciated, thanks!
Upvotes: 2
Views: 242
Reputation: 1322
You're on the right track using the cut
command. The key to removing the [INFO] field is to exclude it from the final output. The -f1,2,4-
argument does just that by including all fields except the 3rd which is just [INFO] at that point.
cut -d: -f2- Input.txt | cut -d' ' -f1,2,4- > Output.txt
Upvotes: 3
Reputation: 46846
More takes on this, in sed, awk and bash:
[ghoti@pc ~]$ cat text
logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
[ghoti@pc ~]$ sed 's/^[^:]*://;s/[[][^]]*[]] //' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
[ghoti@pc ~]$ awk '{sub(/^[^:]+:/,""); $3=""} 1' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
[ghoti@pc ~]$ while read line; do line=${line#*:}; echo "${line/\[*\] }"; done < text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
While these are simple, they may be imperfect for the sake of shortness. For example, the awk script, by eliminating the third "word", leaves spaces that delimit the now-null word.
Note that as "elegant" as one-liners may seem for quick jobs, it's usually a better idea to be explicit with your code, especially when you have to deal with unknown input data or if you won't be inspecting your results immediately after you run things.
This is harder to read, but could be much safer, depending on your input:
[ghoti@pc ~]$ awk '$3~/^[[].+[]]$/{$3="";sub(/ /," ")} {sub(/^[^:]+:/,"")} 1' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> çb<BoonTheMoon>çohhhhhh
For the bash script, you'd be safer to use a character class rather than a glob:
[ghoti@pc ~]$ shopt -s extglob
[ghoti@pc ~]$ while read line; do line=${line#*:}; echo "${line/\[+([[:upper:]])\] /}"; done < text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> çb<BoonTheMoon>çohhhhhh
Note that the extglob
shopt option lets you use more advanced pattern matching inside the parameter replacement pattern. man bash
and look for Pathname Expansion
for details.
UPDATE:
You've added a new requirement to your question that wasn't there originally. Here's how you can achieve your new requirement with awk:
awk '$3~/^[[].+[]]$/{$3="";sub(/ /," ")} {sub(/^[^:]+:/,"")} $3~/^<.+>$/{sub(/^(§b)?<[[:alpha:]]+>§/,"",$4)} 1' text
This simply removes coloured nicknames from the 4th string, if the 3rd string looks like a bracketed nickname. This works for the sample you posted, but only you can determine whether this will work for you.
And with bash:
shopt -s extglob
while read date time tag nick line; do
printf "%s %s %s %s\n" "${date#*:}" "$time" "$nick" "${line/#*([^< ])$nick??}"
done < text
Upvotes: 3
Reputation: 21972
With sed
it could be done more demostrably:
$> cat ./text
logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
$> sed -r -e 's/^.*log:([0-9]{4}-[0-9]{2}-[0-9]{2}\ )([0-9\ \:]*\ )(\[[A-Z]*\]\ )(.*)$/\1\2\4/' ./text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
Whole idea is to match some fields of log string and then leave only ones you need.
Upvotes: 1
Reputation: 143022
(potentially pending revision based on answer to the question posted in the comment above)
Using awk
:
awk '{sub(".log:", ".log "); print $2, $3, $5, $6}' data.txt
will give you:
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
Explanation:
I changed the :
after ".log:
" to a blank and then was able to separate the fields in the line by white-space. The fields you were interested in were 2, 3, 5 and 6 so I printed them out with awk
by using $
to get the content of each field on the line.
Note that you can also use printf
to format the data more precisely if that's needed.
Upvotes: 2