Worice
Worice

Reputation: 4037

Modify piped input

Think of strings, such as:

I have two apples
He has 4 apples 
They have 10 pizzas

I would like to substitute every digit number I find with in a string with a different value, calculated with an external script. In my case, the python program digit_to_word.py convert a digit number to the alphabetic format, but anything will be ok so that I can get the process.

Expected output:

I have two apples
He has four apples 
They have ten pizzas

Conceptually:

echo "He has four apples" |
while read word;
do
    if [[ "$word" == +([0-9+]) ]]; then
    NUM='${python digit_to_word.py "$word"}'
    $word="$NUM"
fi
done |
other_operation... | etc..

I say conceptually because I did not get even close to make it work. It is hard to me to even find information on the issue, simply because I do not exactly know how to conceptualize it. At this point, I am mostly reasoning on process substitution, but I am afraid it is not the best way.

Any hint that could be really useful. Thanks in advance for sharing your knowledge with me!

Upvotes: 0

Views: 108

Answers (4)

cdub
cdub

Reputation: 2297

Revision

This approach decomposes each line into two arrays - one for the words and one for the whitespace. Each line is then reconstructed by interleaving the array elements, with digits translated to words by the Python script. Thanks to @Charles Duffy for pointing out some common Bash pitfalls with my original answer.

while IFS= read -r line; do
  # Decompose the line into an array of words delimited by whitespace
  IFS=" " read -ra word_array <<< $(echo "$line" | sed 's/[[:space:]]/ /g')

  # Invert the decomposition, creating an array of whitespace delimited by words
  IFS="w" read -ra wspace_array <<< $(echo "$line" | sed 's/\S/w/g' | tr -s 'w')

  # Interleave the array elements in the output, translating digits to text
  for ((i=0; i<${#wspace_array[@]}; i++))
  do
    printf "%s" "${wspace_array[$i]}"
    if [[ "${word_array[$i]}" =~ ^[0-9]+$ ]]; then
      printf "%s" "$(digit_to_word.py ${word_array[$i]})"
    else
      printf "%s" "${word_array[$i]}"
    fi
  done
  printf "\n"
done < sample.txt

Upvotes: 1

Charles Duffy
Charles Duffy

Reputation: 295443

regex='([[:space:]])([0-9]+)([[:space:]])'

echo "He has 4 apples" |
while IFS= read -r line; do
  line=" ${line} "  # pad with space so first and last words work consistently
  while [[ $line =~ $regex ]]; do       # loop while at least one replacement is pending
    pre_space=${BASH_REMATCH[1]}                # whitespace before the word, if any
    word=${BASH_REMATCH[2]}                     # actual word to replace
    post_space=${BASH_REMATCH[3]}               # whitespace after the word, if any
    replace=$(python digit_to_word.py "$word")  # new word to use
    in=${pre_space}${word}${post_space}         # old word padded with whitespace
    out=${pre_space}${replace}${post_space}     # new word padded with whitespace
    line=${line//$in/$out}                      # replace old w/ new, keeping whitespace
  done
  line=${line#' '}; line=${line%' '}            # remove the padding we added earlier
  printf '%s\n' "$line"                         # write the output line
done

This is careful to work even in some tricky corner cases:

  • 4 score and 14 years ago only replaces the 4 in 4 score with four, and doesn't also modify the 4 in 14.
  • Input that mixes tabs and whitespaces generates output with the same kinds of whitespace; printf '1\t2 3\n' as your input, and you'll get a tab between one and two, but a space between two and three.

See this running at https://ideone.com/SOsuAD

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 246827

I'd suggest this is a better job for perl.

To recreate the scenario:

$ cat digit_to_word.sh
case $1 in
4) echo four;;
8) echo eight;;
10) echo ten;;
*) echo "$1";;
esac
$ bash digit_to_word.sh 10
ten

Then this

perl -pe 's/(\d+)/ chomp($word = qx{bash digit_to_word.sh $1}); $word /ge' <<END
I have two apples
He has 4 apples
They have 10 pizzas but only 8 cookies
END

outputs

I have two apples
He has four apples
They have ten pizzas but only eight cookies

However, you've already got some python, why don't you implement the replacement part in python too?

Upvotes: 2

Kaan
Kaan

Reputation: 5754

You could use sed for this. Here's an example:

$ echo "He has 4 apples" | sed 's/4/four/'
He has four apples

Looking at the example data though, sed might not be a good fit. If you see "1", you want to replace with "one", but your example replaced "10" with "ten". Do you need to support multi-digit numbers, such as replacing "230" with "two hundred and thirty"?

Upvotes: 0

Related Questions