Reputation: 3722
I am trying to separate out parts of a path as follows. My input path takes the following possible forms:
bucket
bucket/dir1
bucket/dir1/dir2
bucket/dir1/dir2/dir3
...
I want to separate the first part of the path (bucket
) from the rest of the string if present (dir1/dir2/dir3/...
), and store both in separate variables.
The following gives me something close to what I want:
❯ BUCKET=$(echo "bucket/dir1/dir2" | sed 's@\(^[^\/]*\)[\/]\(.*\)@\1@')
❯ EXTENS=$(echo "bucket/dir1/dir2" | sed 's@\(^[^\/]*\)[\/]\(.*\)@\2@')
echo $BUCKET $EXTENS
❯ bucket dir1/dir2
HOWEVER, it fails if I only have bucket
as input (without a slash):
❯ BUCKET=$(echo "bucket" | sed 's@\(^[^\/]*\)[\/]\(.*\)@\1@')
❯ EXTENS=$(echo "bucket" | sed 's@\(^[^\/]*\)[\/]\(.*\)@\2@')
echo $BUCKET $EXTENS
❯ bucket bucket
... because, in the absence of the first '/', no capture happens, so no substitution takes place. When the input is just 'bucket'
I would like $EXTENS
to be set to the empty string ""
.
Thanks!
Upvotes: 0
Views: 78
Reputation: 29212
For something so simple you could use bash
built-in instead of launching sed
:
$ path="bucket/dir1/dir2"
$ bucket="${path%%/*}"
$ extens="${path#$bucket}"
$ printf '|%s|%s|\n' "$bucket" "$extens"
|bucket|/dir1/dir2|
$ path="bucket"
$ bucket="${path%%/*}"
$ extens="${path#$bucket}"
$ printf '|%s|%s|\n' "$bucket" "$extens"
|bucket||
But if you really want to use sed
and capture groups:
$ declare -a bucket_extens
$ mapfile -td '' bucket_extens < <(printf '%s' "bucket/dir1/dir2" | sed -E 's!([^/]*)(.*)!\1\x00\2!')
$ printf '|%s|%s|\n' "${bucket_extens[@]}"
|bucket|/dir1/dir2|
$ mapfile -td '' bucket_extens < <(printf '%s' "bucket" | sed -E 's!([^/]*)(.*)!\1\x00\2!')
$ printf '|%s|%s|\n' "${bucket_extens[@]}"
|bucket||
We use the extended regex (-E
) to simplify a bit, and !
as separator of the substitute command. The first capture group is simply anything not containing a slash and the second is everything else, including nothing if there's nothing else.
In the replacement string we separate the two capture groups with a NUL character (\x00
). We then use mapfile
to assign the result to bash array bucket_extens
.
The NUL trick is a way to deal with file names containing spaces, newlines... NUL is the only character that cannot be part of a file name. The -d ''
option of mapfile
indicates that the lines to map are separated by NUL instead of the default newline.
Upvotes: 2
Reputation: 425063
Don't capture anything. Instead, just match what you don't want and replace it with nothing:
BUCKET=$(echo "bucket" | sed 's@/.*@@'). # bucket
BUCKET=$(echo "bucket/dir1/dir2" | sed 's@/.*@@') # bucket
EXTENS=$(echo "bucket" | sed 's@[^/]*@@') # blank
EXTENS=$(echo "bucket/dir1/dir2" | sed 's@[^/]*@@') # /dir1/dir2
Upvotes: 1
Reputation: 22022
As you are putting a slash in the regex. the string with no slashes will not
match. Let's make the slash optional as /\?
. (A backslash before ?
is requires due to the sed BRE
.) Then would you please try:
#!/bin/bash
#path="bucket/dir1/dir2"
path="bucket"
bucket=$(echo "$path" | sed 's@\(^[^/]*\)/\?\(.*\)@\1@')
extens=$(echo "$path" | sed 's@\(^[^/]*\)/\?\(.*\)@\2@')
echo "$bucket" "$extens"
Upvotes: 0