Reputation: 9701
I've seen this example:
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}
Which follows this syntax: ${variable//pattern/replacement}
Unfortunately the pattern
field doesn't seem to support full regex syntax (if I use .
or \s
, for example, it tries to match the literal characters).
How can I search/replace a string using full regex syntax?
Upvotes: 266
Views: 376979
Reputation: 57
Trying to make it as small as possible using bash:
ls ${PATH//:/ }
It comes with a header per directory in your path, which you could choose to filter out with grep.
Upvotes: 0
Reputation: 10606
You can use python. This will be not efficient, but gets the job done with a bit more flexible syntax.
The following pythonscript will replace "FROM" (but not "notFrom") with "TO".
regex_replace.py
import sys
import re
for line in sys.stdin:
line = re.sub(r'(?<!not)FROM', 'TO', line)
sys.stdout.write(line)
You can apply that on a text file, like
$ cat test.txt
bla notFROM
FROM FROM
bla bla
FROM bla
bla notFROM FROM
bla FROM
bla bla
$ cat test.txt | python regex_replace.py
bla notFROM
TO TO
bla bla
TO bla
bla notFROM TO
bla TO
bla bla
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello
PYTHON_CODE=$(cat <<END
import sys
import re
for line in sys.stdin:
line = re.sub(r'[0-9]', '', line)
sys.stdout.write(line)
END
)
echo $hello | python -c "$PYTHON_CODE"
output
ho02123ware38384you443d34o3434ingtod38384day
howareyoudoingtodday
Upvotes: -6
Reputation: 4623
Set the var
hello=ho02123ware38384you443d34o3434ingtod38384day
then, echo with regex replacement on var
echo ${hello//[[:digit:]]/}
and this will print:
howareyoudoingtodday
Extra - if you'd like the opposite (to get the digit characters)
echo ${hello//[![:digit:]]/}
and this will print:
021233838444334343438384
Upvotes: 1
Reputation: 36048
This example in the input hello ugly world
it searches for the regex bad|ugly
and replaces it with nice
#!/bin/bash
# THIS FUNCTION NEEDS THREE PARAMETERS
# arg1 = input Example: hello ugly world
# arg2 = search regex Example: bad|ugly
# arg3 = replace Example: nice
function regex_replace()
{
# $1 = hello ugly world
# $2 = bad|ugly
# $3 = nice
# REGEX
re="(.*?)($2)(.*)"
if [[ $1 =~ $re ]]; then
# if there is a match
# ${BASH_REMATCH[0]} = hello ugly world
# ${BASH_REMATCH[1]} = hello
# ${BASH_REMATCH[2]} = ugly
# ${BASH_REMATCH[3]} = world
# hello + nice + world
echo ${BASH_REMATCH[1]}$3${BASH_REMATCH[3]}
else
# if no match return original input hello ugly world
echo "$1"
fi
}
# prints 'hello nice world'
regex_replace 'hello ugly world' 'bad|ugly' 'nice'
# to save output to a variable
x=$(regex_replace 'hello ugly world' 'bad|ugly' 'nice')
echo "output of replacement is: $x"
exit
Upvotes: 0
Reputation: 111
I know this is an ancient thread, but it was my first hit on Google, and I wanted to share the following resub
that I put together, which adds support for multiple $1, $2, etc. backreferences...
#!/usr/bin/env bash
############################################
### resub - regex substitution in bash ###
############################################
resub() {
local match="$1" subst="$2" tmp
if [[ -z $match ]]; then
echo "Usage: echo \"some text\" | resub '(.*) (.*)' '\$2 me \${1}time'" >&2
return 1
fi
### First, convert "$1" to "$BASH_REMATCH[1]" and 'single-quote' for later eval-ing...
### Utility function to 'single-quote' a list of strings
squot() { local a=(); for i in "$@"; do a+=( $(echo \'${i//\'/\'\"\'\"\'}\' )); done; echo "${a[@]}"; }
tmp=""
while [[ $subst =~ (.*)\${([0-9]+)}(.*) ]] || [[ $subst =~ (.*)\$([0-9]+)(.*) ]]; do
tmp="\${BASH_REMATCH[${BASH_REMATCH[2]}]}$(squot "${BASH_REMATCH[3]}")${tmp}"
subst="${BASH_REMATCH[1]}"
done
subst="$(squot "${subst}")${tmp}"
### Now start (globally) substituting
tmp=""
while read line; do
counter=0
while [[ $line =~ $match(.*) ]]; do
eval tmp='"${tmp}${line%${BASH_REMATCH[0]}}"'"${subst}"
line="${BASH_REMATCH[$(( ${#BASH_REMATCH[@]} - 1 ))]}"
done
echo "${tmp}${line}"
done
}
resub "$@"
##################
### EXAMPLES ###
##################
### % echo "The quick brown fox jumps quickly over the lazy dog" | resub quick slow
### The slow brown fox jumps slowly over the lazy dog
### % echo "The quick brown fox jumps quickly over the lazy dog" | resub 'quick ([^ ]+) fox' 'slow $1 sheep'
### The slow brown sheep jumps quickly over the lazy dog
### % animal="sheep"
### % echo "The quick brown fox 'jumps' quickly over the \"lazy\" \$dog" | resub 'quick ([^ ]+) fox' "\"\$low\" \${1} '$animal'"
### The "$low" brown 'sheep' 'jumps' quickly over the "lazy" $dog
### % echo "one two three four five" | resub "one ([^ ]+) three ([^ ]+) five" 'one $2 three $1 five'
### one four three two five
### % echo "one two one four five" | resub "one ([^ ]+) " 'XXX $1 '
### XXX two XXX four five
### % echo "one two three four five one six three seven eight" | resub "one ([^ ]+) three ([^ ]+) " 'XXX $1 YYY $2 '
### XXX two YYY four five XXX six YYY seven eight
H/T to @Charles Duffy re: (.*)$match(.*)
Upvotes: 6
Reputation: 27553
Use sed:
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX
Note that the subsequent -e
's are processed in order. Also, the g
flag for the expression will match all occurrences in the input.
You can also pick your favorite tool using this method, i.e. perl, awk, e.g.:
echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'
This may allow you to do more creative matches... For example, in the snip above, the numeric replacement would not be used unless there was a match on the first expression (due to lazy and
evaluation). And of course, you have the full language support of Perl to do your bidding...
Upvotes: 236
Reputation: 1802
If you are making repeated calls and are concerned with performance, This test reveals the BASH method is ~15x faster than forking to sed and likely any other external process.
hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X
P1=$(date +%s)
for i in {1..10000}
do
echo $hello | sed s/X//g > /dev/null
done
P2=$(date +%s)
echo $[$P2-$P1]
for i in {1..10000}
do
echo ${hello//X/} > /dev/null
done
P3=$(date +%s)
echo $[$P3-$P2]
Upvotes: 14
Reputation: 1292
Use [[:digit:]]
(note the double brackets) as the pattern:
$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday
Just wanted to summarize the answers (especially @nickl-'s https://stackoverflow.com/a/22261334/2916086).
Upvotes: 13
Reputation: 8721
These examples also work in bash no need to use sed:
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X}
echo ${MYVAR//[0-9]/N}
you can also use the character class bracket expressions
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X}
echo ${MYVAR//[[:digit:]]/N}
output
XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX
What @Lanaru wanted to know however, if I understand the question correctly, is why the "full" or PCRE extensions \s\S\w\W\d\D
etc don't work as supported in php ruby python etc. These extensions are from Perl-compatible regular expressions (PCRE) and may not be compatible with other forms of shell based regular expressions.
These don't work:
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//\d/}
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/\d//g'
output with all literal "d" characters removed
ho02123ware38384you44334o3434ingto38384ay
but the following does work as expected
#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/\d//g'
output
howareyoudoingtodday
Hope that clarifies things a bit more but if you are not confused yet why don't you try this on Mac OS X which has the REG_ENHANCED flag enabled:
#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E '\d'
On most flavours of *nix you will only see the following output:
d
d
d
nJoy!
Upvotes: 151
Reputation: 295291
This actually can be done in pure bash:
hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"
...yields...
howareyoudoingtodday
Upvotes: 177