Ilkay Isik
Ilkay Isik

Reputation: 213

Extracting a substring until and including a matching word using bash tools

I have file names like these:

func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-pfobloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-rest_run-01_bold_space-T1w_preproc.nii.gz

and from each file name I want to extract the part until and including the word bold so that in the end I have:

func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold

Any ideas how to do that?

Upvotes: 1

Views: 107

Answers (7)

Walter A
Walter A

Reputation: 19982

f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.g
echo "${f//bold*/bold}"

Upvotes: 0

Benjamin W.
Benjamin W.

Reputation: 52112

This is similar to glenn's solution, but a bit "less clever" in that it doesn't use substrings, just nested substitutions:

$ while IFS= read -r fname; do echo "${fname%"${fname#*bold}"}"; done < infile
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold

The substitution "${fname%"${fname#*bold}"}" says:

  • Remove "${fname#*bold}" from the end of each filename, where
  • "${fname#*bold}" is everything up to and including bold removed from the front of the filename

Example for the first filename with explicit intermediate steps:

$ fname=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${fname#*bold}"
_space-T1w_preproc.nii.gz
$ echo "${fname%"${fname#*bold}"}"
func/sub-01_task-biommtloc_run-01_bold

Upvotes: 0

stack0114106
stack0114106

Reputation: 8711

using Perl

> echo "func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz" | perl -e 'while (<>) { $_=~s/(.*bold)(.*)/\1/g; print } '
func/sub-01_task-biommtloc_run-01_bold
>

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246774

This is (needlessly) clever: remove the prefix ending with "bold" and then so some substring index arithmetic based on the length of the suffix that's left over:

$ file=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ echo "$keep"
func/sub-01_task-biommtloc_run-01_bold

If $file does not contain "bold", then $keep will be empty: we can give it the value of $file if it is empty:

$ file=foobar
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ : ${keep:=$file}
$ echo "$keep"
foobar

But seriously, do what chepner suggests.

Upvotes: 0

Lewis M
Lewis M

Reputation: 548

Is something like this what you want?

echo func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz | sed -e 's#bold_.*$#bold#'

Hope this helps

Upvotes: 1

HackerBoss
HackerBoss

Reputation: 829

I would recommend using sed for this task. First take all of your input filenames and stick them in a file, call it namelist.txt in the current directory. The following will work, as long as your sed supports extended regular expressions (which most will, particularly GNU sed). Note that the flag for extended regular expressions may differ a bit between platforms, check your sed manual page. On my Linux, it is -r.

bash -c "sed -r 's/(sub-01_task-.{1,10}_run-01_bold).+/\\1/' namelist.txt"

Upvotes: -1

chepner
chepner

Reputation: 531055

The easiest thing to do is to just remove bold and everything after, then replace bold. Obviously, this only works if the terminating string is fixed, as in this case.

$ f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${f%%bold*}"
func/sub-01_task-biommtloc_run-01_
$ echo "${f%%bold*}bold"
func/sub-01_task-biommtloc_run-01_bold

Upvotes: 3

Related Questions