user2532296
user2532296

Reputation: 848

extracting a specific word between : using sed, awk or grep

I have a file that has the following contents and many more.

#set_property board_part my.biz:ab0860_1cf:part0:1.0 [current_project]
 set_property board_part my.biz:ab0820_1ab:part0:1.0 [current_project]

My ideal output is as shown below (ie, the text after the first ":" and the second ":".

ab0820_1ab

I generally use python and use regular expression along the lines of below to get the result.

\s*set_property board_part trenz.biz:([a-zA-Z_0-9]+)

I wish to know how can it be done quickly and in a more generic way using commandline tools (sed, awk).

Upvotes: 0

Views: 67

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

Your example data has my.biz but your pattern tries to match trenz.biz

If gnu awk is available, you can use the capture group and then print the first value of which is available in a[1]

awk 'match($0, /^\s*set_property board_part \w+\.biz:(\w+)/, a) {print a[1]}' file

The pattern matches:

  • ^ Start of string
  • \s* Match optional whitespace chars
  • set_property board_part Match literally
  • \w+\.biz: Match 1+ word chars followed by .biz (note to escape the dot to match it literally)
  • (\w+) Capture group 1, match 1+ word chars

Notes

  • If you just want to match trenz.biz then you can replace \w+\.biz with trenz\.biz
  • If the strings are not at the start of the string, you can change ^ for \s wo match a whitespace char instead

Output

ab0820_1ab

Upvotes: 1

Daweo
Daweo

Reputation: 36360

You might use GNU sed following way, let file.txt content be

#set_property board_part my.biz:ab0860_1cf:part0:1.0 [current_project]
 set_property board_part my.biz:ab0820_1ab:part0:1.0 [current_project]
garbage garbage garbage

then

sed -n '/ set_property board_part my.biz/ s/[^:]*:\([^:]*\):.*/\1/ p' file.txt

gives output

ab0820_1ab

Explanation: -n turns off default printing, / set_property board_part my.biz/ is so-called address, following commands will be applied solely to lines matching adress. First command is substitution (s) which use capturing group denoted by \( and \), regular expression is as followes zero-or-more non-: (i.e. everything before 1st :), :, then zero-or-more non-: (i.e. everything between 1st and 2nd :) encased in capturing group : and zero-or-more any character (i.e. everything after 2nd :), this is replaced by content of 1st (and sole in this case) capturing group. After substitution takes place p command is issued to prompt GNU sed to print changed line.

(tested in GNU sed 4.2.2)

Upvotes: 2

Related Questions