Sophie
Sophie

Reputation: 87

Bash: filtering and replacing by pattern

Given this string of paths, separated by spaces:

path/folderA/fileA1 path/folderA/subFolderA/fileA2 path/folderB/fileB1
  1. I would like to get a string of paths, separated by spaces, only the ones starting with path/folderA/.
    Output: path/folderA/fileA1 path/folderA/subFolderA/fileA2

  2. Then remove any match of path/folderA/ from this string.
    Final output: fileA1 subFolderA/fileA2

Could this be done with a single line?

Upvotes: 0

Views: 746

Answers (3)

thanasisp
thanasisp

Reputation: 5975

With grep.

echo " $str" | grep -oP '(?<=\spath/folderA/)\S+' | xargs

-P enables the use of Perl regexp syntax and you can use (?<=pattern) which is a positive look-behind assertion. Also -o keeps only the matched part after that pattern, which is \S+, a sequence of non-white-space characters (until we find the next space, tab, newline etc.)

Also grep output is always separated by newlines, so you have to pipe to tr '\n' ' ' or xargs or similar to get one line.

Edit: to match only the beginning of the path, I added \s (one whitespace character) and feed the input as " $str". This seemed easier fix, because \b matches / also, and (^|\s) throws grep: lookbehind assertion is not fixed length. So testing with this is ok:

> echo "$str"
path/folderA/fileA1 path/folderA/subfolderA/fileA2 path/path/folderA/not
> echo " $str" | grep -owP '(?<=\spath/folderA/)\S+' | xargs
fileA1 subFolderA/fileA2

Upvotes: 1

David C. Rankin
David C. Rankin

Reputation: 84642

You can do it simply with awk matching the last set of word characters in each field and outputting them, e.g.

awk '{for (i=1; i<=NF; i++) if ($i ~ /folderA/) { match($i,/\w+$/); print substr($i,RSTART,RLENGTH)}}' <<< $path_str

Example Use/Output

path_str="path/folderA/fileA1 path/folderA/subFolderA/fileA2 path/folderB/fileB1"
awk '{for (i=1; i<=NF; i++) if ($i ~ /folderA/) { match($i,/\w+$/); print substr($i,RSTART,RLENGTH)}}' <<< $path_str
fileA1
fileA2

You can adjust the output format as desired. If you want the output all on one line, or if you want to use command substitution to capture the output in a new array, it's up to you.

Using Bash Parameter Expansions

If you want to use parameter expansions with substring removal, you can use a simple loop and the expansion $(var##*/} to remove everything up to the final '/' from each path component, e.g.

path_str="path/folderA/fileA1 path/folderA/subFolderA/fileA2 path/folderB/fileB1"
for i in $path_str; do 
    [[ $i =~ folderA ]] && echo ${i##*/}
done
fileA1
fileA2

For your case the parameter expansion is likely the most efficient as it is a built-in to your shell and avoids spawning a subshell. However, if you had hundreds of thousands of components, I'd probably let awk handle it then.

The set of POSIX compliant parameter expansions with substring removal are:

${var#pattern}      Strip shortest match of pattern from front of $var
${var##pattern}     Strip longest match of pattern from front of $var
${var%pattern}      Strip shortest match of pattern from back of $var
${var%%pattern}     Strip longest match of pattern from back of $var

Bash provides many, many more parameter expansion in addition to those provided by POSIX. Including everything from substring replacement to character case conversion.

Let me know if you have further questions.

Upvotes: 0

Todd A. Jacobs
Todd A. Jacobs

Reputation: 84443

If you're starting with a string you run the risk that embedded spaces, newlines, or other problematic characters can throw things off. That's why it's usually better to work with globs or null-terminated values.

That said, you can use various builtins and expansions to get the results you want from your given example. Note that you must escape your forward slashes properly or store them in a quoted string to avoid interfering with the expansion syntax. For example:

path_str="path/folderA/fileA1 path/folderA/subFolderA/fileA2 path/folderB/fileB1"
match_str="path/folderA/"

read -ra paths <<< "$path_str"
for i in "${!paths[@]}"; do
    [[ ! "${paths[i]}" =~ $match_str ]] && unset paths[i]
done

echo "${paths[@]//$match_str}"

This will print:

fileA1 subFolderA/fileA2

Upvotes: 2

Related Questions