pingu87
pingu87

Reputation: 113

Extract string between underscores and dot

I have strings like these:

/my/directory/file1_AAA_123_k.txt 
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

So basically, the number of underscores is not fixed. I would like to extract the string between the first underscore and the dot. So the output should be something like this:

AAA_123_k
CCC
KK_45

I found this solution that works:

string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed  's/^[^_:]*[_:]//'

But I am wondering if there is a more 'elegant' solution (e.g. 1 line code).

Upvotes: 3

Views: 1166

Answers (7)

anubhava
anubhava

Reputation: 784878

A simpler sed solution without any capturing group:

sed -E 's/^[^_]*_|\.[^.]*$//g' file

AAA_123_k
CCC
KK_45

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133428

With your shown samples, with GNU grep you could try following code.

grep -oP '.*?_\K([^.]*)' Input_file

Explanation: Using GNU grep's -oP options here to print exact match and to enable PCRE regex respectively. In main program using regex .*?_\K([^.]*) to get value between 1st _ and first occurrence of .. Explanation of regex is as follows:

Explanation of regex:

.*?_     ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K       ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*)  ##Matching everything till first occurrence of dot as per need.

Upvotes: 2

sseLtaH
sseLtaH

Reputation: 11207

Using sed

$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45

Upvotes: 2

Cyrus
Cyrus

Reputation: 88543

With bash version >= 3.0 and a regex:

[[ "$string" =~ _(.+)\. ]] && echo "${BASH_REMATCH[1]}"

Upvotes: 4

markp-fuso
markp-fuso

Reputation: 33854

If you need to process the file names one at a time (eg, within a while read loop) you can perform two parameter expansions, eg:

$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k

One idea to parse a list of file names at the same time:

$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt

$ sed -En 's/[^_]*_([^.]+).*/\1/p' file.list
AAA_123_k
CCC
KK_45

Upvotes: 1

Tim Roberts
Tim Roberts

Reputation: 54635

This is easy, except that it includes the initial underscore:

ls | grep -o "_[^.]*"

Upvotes: -3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

You can use a single sed command like

sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"

See the online demo. Details:

  • ^ - start of string
  • .* - any text
  • / - a / char
  • [^_/]* - zero or more chars other than / and _
  • _ - a _ char
  • \([^/]*\) (POSIX BRE) / ([^/]*) (POSIX ERE, enabled with E option) - Group 1: any zero or more chars other than /
  • \. - a dot
  • [^./]* - zero or more chars other than . and /
  • $ - end of string.

With -n, default line output is suppressed and p only prints the result of successful substitution.

Upvotes: 2

Related Questions