Rejith Krishnan
Rejith Krishnan

Reputation: 127

extracting a word from a string in unix

i wanted to extract the word prior to a pattern from a string in unix.How can i acheive this?

eg: say the string is "sv_z = sample.scr" in the string i have to search for ".scr".If found in the string i have to extract that whole word. In this example the output should be sample.scr. The delimiter to arrive at the word can be balnk space,double quotoes or equal to.

Here's a few more examples:

sv_z=sample.scr
sv_z=urhk_dbCall("sample.scr")
sv_z="sample.scr"

Here's my expected output:

sample.scr
sample.scr
sample.scr

Upvotes: 1

Views: 3814

Answers (3)

Steve
Steve

Reputation: 54392

Here's one way using grep:

grep -o '[^ "=]*\.scr' file

Explanation:

  • The -o flag matches the pattern exactly.
  • [ ... ] is a character class. If a carat (^) is used as the first character in this class, it is a negation of the class, it effectively means, "none of the following characters".
  • * says match whatever the last character was, zero or any number of times.

EDIT:

Alternatively, if you require more strictness you'll need Perl-regex and a positive lookahead. In the example below, this will ensure that the match is followed by, a double quote, a space or an end of line. Also, you could change the star (*) into a plus sign (+) which means match once or more times. So this would filter out things like: .scr. But it's not clear from your example input exactly what you're looking for here. Good luck.

grep -oP '[^ "=]*\.scr(?=("| |$))' file

Upvotes: 2

Tedee12345
Tedee12345

Reputation: 1210

Another solution:

 awk -F= 'NR==1{print $2}{FS="\""}NR>1{print $2}' file

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203502

In this awk script I'm using a variable "d" to contain the list of allowed delimiters to save repeating them multiple times in the script:

$ cat file
sv_z=sample.scr
sv_z=urhk_dbCall("sample.scr")
sv_z="sample.scr"
sv_z="unscrambled"
sv_z="sample.scrambled"

$ awk -v d=' "=' 'match($0,"["d"][^"d"]+\.scr(["d"]|$)") { $0=substr($0,RSTART,RLENGTH); gsub("["d"]",""); print NR, $0 }' file
1 sample.scr
2 sample.scr
3 sample.scr

Compare with the posted grep -o solution:

$ grep -n -o '[^ "=]*.scr' file
1:sample.scr
2:sample.scr
3:sample.scr
4:unscr
5:sample.scr

Notice those last 2 lines that you probably don't want in the grep output.

Upvotes: 0

Related Questions