Reputation: 4037
Think of strings, such as:
I have two apples
He has 4 apples
They have 10 pizzas
I would like to substitute every digit number I find with in a string with a different value, calculated with an external script. In my case, the python program digit_to_word.py
convert a digit number to the alphabetic format, but anything will be ok so that I can get the process.
Expected output:
I have two apples
He has four apples
They have ten pizzas
Conceptually:
echo "He has four apples" |
while read word;
do
if [[ "$word" == +([0-9+]) ]]; then
NUM='${python digit_to_word.py "$word"}'
$word="$NUM"
fi
done |
other_operation... | etc..
I say conceptually because I did not get even close to make it work. It is hard to me to even find information on the issue, simply because I do not exactly know how to conceptualize it. At this point, I am mostly reasoning on process substitution, but I am afraid it is not the best way.
Any hint that could be really useful. Thanks in advance for sharing your knowledge with me!
Upvotes: 0
Views: 108
Reputation: 2297
Revision
This approach decomposes each line into two arrays - one for the words and one for the whitespace. Each line is then reconstructed by interleaving the array elements, with digits translated to words by the Python script. Thanks to @Charles Duffy for pointing out some common Bash pitfalls with my original answer.
while IFS= read -r line; do
# Decompose the line into an array of words delimited by whitespace
IFS=" " read -ra word_array <<< $(echo "$line" | sed 's/[[:space:]]/ /g')
# Invert the decomposition, creating an array of whitespace delimited by words
IFS="w" read -ra wspace_array <<< $(echo "$line" | sed 's/\S/w/g' | tr -s 'w')
# Interleave the array elements in the output, translating digits to text
for ((i=0; i<${#wspace_array[@]}; i++))
do
printf "%s" "${wspace_array[$i]}"
if [[ "${word_array[$i]}" =~ ^[0-9]+$ ]]; then
printf "%s" "$(digit_to_word.py ${word_array[$i]})"
else
printf "%s" "${word_array[$i]}"
fi
done
printf "\n"
done < sample.txt
Upvotes: 1
Reputation: 295443
regex='([[:space:]])([0-9]+)([[:space:]])'
echo "He has 4 apples" |
while IFS= read -r line; do
line=" ${line} " # pad with space so first and last words work consistently
while [[ $line =~ $regex ]]; do # loop while at least one replacement is pending
pre_space=${BASH_REMATCH[1]} # whitespace before the word, if any
word=${BASH_REMATCH[2]} # actual word to replace
post_space=${BASH_REMATCH[3]} # whitespace after the word, if any
replace=$(python digit_to_word.py "$word") # new word to use
in=${pre_space}${word}${post_space} # old word padded with whitespace
out=${pre_space}${replace}${post_space} # new word padded with whitespace
line=${line//$in/$out} # replace old w/ new, keeping whitespace
done
line=${line#' '}; line=${line%' '} # remove the padding we added earlier
printf '%s\n' "$line" # write the output line
done
This is careful to work even in some tricky corner cases:
4 score and 14 years ago
only replaces the 4
in 4 score
with four
, and doesn't also modify the 4
in 14
.printf '1\t2 3\n'
as your input, and you'll get a tab between one
and two
, but a space between two
and three
.See this running at https://ideone.com/SOsuAD
Upvotes: 2
Reputation: 246827
I'd suggest this is a better job for perl.
To recreate the scenario:
$ cat digit_to_word.sh
case $1 in
4) echo four;;
8) echo eight;;
10) echo ten;;
*) echo "$1";;
esac
$ bash digit_to_word.sh 10
ten
Then this
perl -pe 's/(\d+)/ chomp($word = qx{bash digit_to_word.sh $1}); $word /ge' <<END
I have two apples
He has 4 apples
They have 10 pizzas but only 8 cookies
END
outputs
I have two apples
He has four apples
They have ten pizzas but only eight cookies
However, you've already got some python, why don't you implement the replacement part in python too?
Upvotes: 2
Reputation: 5754
You could use sed
for this. Here's an example:
$ echo "He has 4 apples" | sed 's/4/four/'
He has four apples
Looking at the example data though, sed
might not be a good fit. If you see "1", you want to replace with "one", but your example replaced "10" with "ten". Do you need to support multi-digit numbers, such as replacing "230" with "two hundred and thirty"?
Upvotes: 0