Reputation: 3750
Using sed
I want to parse Heroku's log-runtime-metrics like this one:
2016-01-29T00:38:43.662697+00:00 heroku[worker.2]: source=worker.2 dyno=heroku.17664470.d3f28df1-e15f-3452-1234-5fd0e244d46f sample#memory_total=54.01MB sample#memory_rss=54.01MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=17492pages sample#memory_pgpgout=3666pages
the desired output is:
worker.2: 54.01MB
(54.01MB is being memory_total)
I could not manage although I tried several alternatives including:
sed -E 's/.+source=(.+) .+memory_total=(.+) .+/\1: \2/g'
What is wrong with my command? How can it be corrected?
Upvotes: 0
Views: 330
Reputation: 755006
I'd go for the old-fashioned, reliable, non-extended sed
expressions and make sure that the patterns are not too greedy:
sed -e 's/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/'
The -e
is not the opposite of -E
, which is primarily a Mac OS X (BSD) sed
option; the normal option for GNU sed
is -r
instead. The -e
simply means that the next argument is an expression in the script.
This produces your desired output from the given line of data:
worker.2: 54.01MB
Bonus question: There are some odd lines within the stream, I can usually filter them out using a grep pipe like
| grep memory_total
. However if I try to use it along with thesed
command, it does not work. No output is produced with this:heroku logs -t -s heroku | grep memory_total | sed.......
Sometimes grep | sed
is necessary, but it is often redundant (unless you are using a grep
feature that isn't readily supported by sed
, such as Perl regular expressions).
You should be able to use:
sed -n -e '/memory_total=/ s/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/p'
The -n
means "don't print by default". The /memory_total=/
matches the lines you're after; the s///
content is the same as before. I removed the g
suffix that was there previously; the regex would never match multiple times anyway. I added the p
to print the line when the substitution occurs.
Upvotes: 1
Reputation: 6857
The .+
after source=
and memory_total=
are both greedy, so they accept as much of the line as possible. Use [^ ]
to mean "anything except a space" so that it knows where to stop.
sed -E 's/.+source=([^ ]+) .+memory_total=([^ ]+) .+/\1: \2/g'
Putting your content into https://regex101.com/ makes it really obvious what's going on.
Upvotes: 2