annedroiid
annedroiid

Reputation: 6647

Bash delete everything before the first blank on every line

I have file of logs that all start with the timestamp, followed by the log level and then the message and I want a script that gets rid of the timestamp.

That is, I want a script that for every line of a file would turn:

21:22:34.571 DEBUG - some message

into

DEBUG - some message

I haven't used bash much so any advice would be appreciated.

Upvotes: 0

Views: 2366

Answers (4)

Since you're natively using bash, you can use the power of BASH built in string manipulations as per this example:

for txt in "21:22:34.571 DEBUG - some message" \
    'another .555 message' \
    '33:44:55.666 two timestamps 00:12:34.567 !' \
    'A shorter timestamp 11:22'
do
   echo "'$txt' > '${txt##*\.[0-9][0-9][0-9] }'"
done

'21:22:34.571 DEBUG - some message' > 'DEBUG - some message'
'another .555 message' > 'message'
'33:44:55.666 two timestamps 00:12:34.567 !' > '!'
'A shorter timestamp 11:22' > 'A shorter timestamp 11:22'

Note how the example with timestamp at the end was truncated to just a "!" while another .555 was stripped from the second example. See the explanation for why.

Explanation and Alternatives

BASH has many built in string handling capabilities. That means, among other things, it's possible to do quite a lot with BASH without needing to use any external utilities or subshells.

Replace Leading Characters (# or ##) (also % and %%)

${txt##*\.[0-9][0-9][0-9] } The '#' or '##' operator tells BASH to remove any string that matches the regular expression that follows starting from the left. The difference is that the single "#" matches the shortest match while ## is greedy and matches the longest. Here the *\.[0-9][0-9][0-9] matches ANYTHING that is followed by a period (.), 3 decimals, and a space. That is true of another .555 message so the another .555 (leading portion) was stripped.

If you know the timestamps are only at the beginning and only of the given format, you can do this instead

${txt#*.[0-9][0-9][0-9] } Tells bash to only look for one match that STARTS at the beginning of the string and # instead of ## tells it to match the shortest string.

% and %% work the same way, however they match the END of the string rather than the beginning.

Substitute Matching Pattern (/ or //)

This is the MOST accurate for the examples given.

${txt//[0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9] /}

While it's a little more tedious, The // means substitute all - a single / would only substitute the first match. By specifying the entire pattern which is: two digits, colon, two digits, colon, period, three digits and a space the // option will remove all things that match that timestamp format, and it will NOT match the .555. This is the result:

'21:22:34.571 DEBUG - some message' > 'DEBUG - some message'
'another .555 message' > 'another .555 message'
'33:44:55.666 two timestamps 00:12:34.567 !' > 'two timestamps !'
'A shorter timestamp 11:22' > 'A shorter timestamp 11:22'

References

BASH string manipulations do not provide full "RegEx" (Regular Expression) syntax. But they are often quick and easy to use in lieu of sed, awk, tr and other tools.

There are many more string operations possible than those described above. Here are some more references. I haven't found a clearly readable authoritative reference.

Upvotes: 0

Sundeep
Sundeep

Reputation: 23667

grep can be used as well by simply extracting everything after space

$ cat ip.txt 
21:22:34.571 DEBUG - some message
21:23:34.571 DEBUG - some other message

This will leave a leading blank

$ grep -o ' .*' ip.txt 
 DEBUG - some message
 DEBUG - some other message

This won't

$ grep -oP ' \K.*' ip.txt 
DEBUG - some message
DEBUG - some other message

Upvotes: 1

user8017719
user8017719

Reputation:

If you could use awk:

awk '$1="";1' data_file_name

Else, use the shell (very very slow):

#!/bin/bash
while read -r line; do
    printf '%s\n' "${line#* }"
done <"data_file_name"

Upvotes: 2

GMichael
GMichael

Reputation: 2776

You can try either sed or cut depending on the input data:

sed -e 's/^[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}.[0-9]\{3\}//' <data_file_name>


cut -c 13- <data_file_name>

Upvotes: 2

Related Questions