Soddengecko
Soddengecko

Reputation: 65

How to extract text from a string in Bash using Grep

I have been using Grep with Cut to gather info from log files, but I am having trouble extracting a string when the word count in the line changes, eg;

The line could be

[2014-12-31 21:00] Host: Word1 (LOCATION) [140.56 km] 38.582 ms

or

[2014-12-31 12:00] Host: Word1 Word2 (LOCATION) [76.50 km] 49.508 ms

or

[2014-12-31 12:00] Host: Word1 Word2 Word3 (LOCATION) [76.50 km] 49.508 ms

With my current code,

host_=`grep Host: $FILE | tail -1 | cut -d' ' -f4-`

I am able to get the following

Word1 Word2 (LOCATION) [140.56 km] 38.582 ms

What I would like to do is only get the word(s) plus the location in brackets and not the remaining information so that I end up with this

Word1 Word2 (LOCATION)

The distance and time at the end of the string (whilst they change values) are always in that same position and "date/time" and the word "Host:" are always at the beginning of the string.

Could anyone here point me in the right direction to what I need to use?

I have tried googling and not found anything, but I am not exactly sure what I am looking for.

Thanks

Upvotes: 0

Views: 1850

Answers (2)

Kent
Kent

Reputation: 195039

grep Host: $FILE | tail -1 | grep -Po '.*Host: \K.*\)'

The interesting part is the last grep:

  • -P using perl regex
  • -o output only matched part
  • \K similar as look behind, but supports dynamic length
  • .*\) match the part you need

Upvotes: 1

MattSizzle
MattSizzle

Reputation: 3175

This one is actually not that difficult to do if I am understanding the question right. The following simple regEx within your grep will return only the requested part of each line.

Example

grep -Po  '((?:\w+\s?)*\(\w+\))' FILE.TXT

FILE.TXT

[2014-12-31 21:00] Host: Word1 (LOCATION) [140.56 km] 38.582 ms
[2014-12-31 12:00] Host: Word1 Word2 (LOCATION) [76.50 km] 49.508 ms
[2014-12-31 12:00] Host: Word1 Word2 Word3 (LOCATION) [76.50 km] 49.508 ms

Result

Word1 (LOCATION)
Word1 Word2 (LOCATION)
Word1 Word2 Word3 (LOCATION)

REGEX100

It will match forever until you reach a (word) which will be the last thing captured. It also does not require any piping or output redirection.

Thoughts Personally when I am working within the a shell and have to do ANY string manipulation like you are doing above I go straight for regEx as it is what all the shell commands use internally to return your results. Take for instance grep or globally search a regular expression and print. RegEx is an invaluable tool and only really takes a few minutes to learn the basics of.

Upvotes: 1

Related Questions