sarp
sarp

Reputation: 3750

Parsing a line with sed using regular expression

Using sed I want to parse Heroku's log-runtime-metrics like this one:

2016-01-29T00:38:43.662697+00:00 heroku[worker.2]: source=worker.2 dyno=heroku.17664470.d3f28df1-e15f-3452-1234-5fd0e244d46f sample#memory_total=54.01MB sample#memory_rss=54.01MB sample#memory_cache=0.00MB sample#memory_swap=0.00MB sample#memory_pgpgin=17492pages sample#memory_pgpgout=3666pages

the desired output is:

worker.2: 54.01MB (54.01MB is being memory_total)

I could not manage although I tried several alternatives including:

sed -E 's/.+source=(.+) .+memory_total=(.+) .+/\1: \2/g'

What is wrong with my command? How can it be corrected?

Upvotes: 0

Views: 330

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 755006

I'd go for the old-fashioned, reliable, non-extended sed expressions and make sure that the patterns are not too greedy:

sed -e 's/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/'

The -e is not the opposite of -E, which is primarily a Mac OS X (BSD) sed option; the normal option for GNU sed is -r instead. The -e simply means that the next argument is an expression in the script.

This produces your desired output from the given line of data:

worker.2: 54.01MB

Bonus question: There are some odd lines within the stream, I can usually filter them out using a grep pipe like | grep memory_total. However if I try to use it along with the sed command, it does not work. No output is produced with this:

 heroku logs -t -s heroku | grep memory_total | sed.......

Sometimes grep | sed is necessary, but it is often redundant (unless you are using a grep feature that isn't readily supported by sed, such as Perl regular expressions).

You should be able to use:

sed -n -e '/memory_total=/ s/.*source=\([^ ]*\) .*memory_total=\([^ ]*\) .*/\1: \2/p'

The -n means "don't print by default". The /memory_total=/ matches the lines you're after; the s/// content is the same as before. I removed the g suffix that was there previously; the regex would never match multiple times anyway. I added the p to print the line when the substitution occurs.

Upvotes: 1

Ewan Mellor
Ewan Mellor

Reputation: 6857

The .+ after source= and memory_total= are both greedy, so they accept as much of the line as possible. Use [^ ] to mean "anything except a space" so that it knows where to stop.

sed -E 's/.+source=([^ ]+) .+memory_total=([^ ]+) .+/\1: \2/g'

Putting your content into https://regex101.com/ makes it really obvious what's going on.

Upvotes: 2

Related Questions