opennomad
opennomad

Reputation: 137

Why does `?` in this pattern match more than 1 character?

FINAL UPDATE Turns out it's working as expected, and I was entirely blind to those numbers actually still being there. My apologies for wasting everyone's time, and thanks to everyone for trying to show me the light.

UPDATE I understand that what I'm seeing doesn't make sense. I'm looking for why this is. I understand the manual and what it should do, but it's not what I'm seeing. Here is a super concise version of what I'm seeing:

bash-4.4$ source='ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae,U=8646:2,S'
bash-4.4$ echo -e "src: $source\nnew: ${source/,U=[[:digit:]]?}"
src: ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae,U=8646:2,S
new: ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae46:2,S

This is pasted output which I get from bash v5 and v4, and I understand that this should not be, but it is what I'm seeing. It is clearly removing more than 1 of the digits. Why is the ? matching more than one of the digits?

The remainder is additional detail, but the essential issue is above.

I'm working with maildir files, and I'm looking to remove a substring of the form ,U=123, where the value after the = is an integer. The following code works, but I'm unclear as to why the ? in the parameter expansion does the right thing.

#!/usr/bin/env bash

source="$1"

echo "src: $source"
echo "new: ${source/,U=[[:digit:]]?}"

The script accepts filesnames that look like this:

ff6a3828-dff0-11ef-bf71-fcb3bcddc8ae,U=8654:2,
ff69dbee-dff0-11ef-bf71-fcb3bcddc8ae,U=8650:2,F
ff69f368-dff0-11ef-bf71-fcb3bcddc8ae,U=8651:2,
ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae,U=8646:2,S

An example run:

$ ./maildir-move 'ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae,U=8646:2,S'                                                                                                             
src: ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae,U=8646:2,S
new: ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae46:2,S

man bash refers to the "Pattern Matching" section (from the parameter expansion section) which states that:

? Matches any single character.

So what I have is working, but I don't understand why. It's clearly matching more than 'a single character', but why?

Upvotes: 1

Views: 130

Answers (2)

chepner
chepner

Reputation: 532153

? only matches the 6, following the 8 matched by [[:digit:]]. You can see this more clearly if you insert some spaces into the value of new to replace the matched characters, instead of removing them altogether.

src: ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae,U=8646:2,S
new: ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae     46:2,S
                                         ^^^^^

Only 5 characters are matched and removed. If you are trying to remove the 46 as well, you should use extended globs (which are equivalent to regular expressions in power) as recommended by other answers.

If you don't need to verify that you are deleting an integer, only every character between ,U= and :, you might use the following:

echo "new: ${source/,U=*:/:}"

which matches everything up to and including the :, then replaces it with : instead of the empty string, to restore the original :.

Upvotes: 4

KamilCuk
KamilCuk

Reputation: 141708

Why does ? in this pattern match match more than 1 character?

Is untrue, ? matches one character 6.

why

Glob is not regex. In glob ? matches one character. In regex ? matches one or zero of the preceding expression. See man 7 glob vs man 7 regex.

Lets take the string:

ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae,U=8646:2,S

And glob:

,U=[[:digit:]]?

Starting at the position of , in the string, the following is matched:

part glob expression matched part of the string
, ,
U U
= =
[[:digit:]] 8
? 6

Another representation:

ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae,U=8646:2,S
                                        ^--------- ?
                                       ^---------- [[:digit:]]
                                    ^^^----------- ,U=

The resulting string with ,U=86 removed is:

ff6980f4-dff0-11ef-bf71-fcb3bcddc8ae46:2,S

Upvotes: 4

Related Questions