Reputation: 2509
I have a string of several words in bash called comp_line
, which can have any number of spaces inside. For example:
"foo bar apple banana q xy"
And I have a zero-based index comp_point
pointing to one character in that string, e.g. if comp_point
is 4, it points to the first 'b' in 'bar'.
Based on the comp_point
and comp_line
alone, I want to extract the word being pointed to by the index, where the "word" is a sequence of letters, numbers, punctuation or any other non-whitespace character, surrounded by whitespace on either side (if the word is at the start or end of the string, or is the only word in the string, it should work the same way.)
The word I'm trying to extract will become cur
(the current word)
Based on this, I've come up with a set of rules:
Read the current character curchar
, the previous character prevchar
, and the next character nextchar
. Then:
If curchar
is a graph character (non-whitespace), set cur
to the letters before and after curchar
(stopping until you reach a whitespace or string start/end on either side.)
Else, if prevchar
is a graph character, set cur
to the letters from the previous letter, backwards until the previous whitespace character/string start.
Else, if nextchar
is a graph character, set cur
to the letters from the next letter, forwards until the next whitespace character/string end.
If none of the above conditions are hit (meaning curchar
, nextchar
and prevchar
are all whitespace characters,) set cur
to ""
(empty string)
I've written some code which I think achieves this. Rules 2, 3 and 4 are relatively straightforward, but rule 1 is the most difficult to implement - I've had to do some complicated string slicing. I'm not convinced that my solution is in any way ideal, and want to know if anyone knows of a better way to do this within bash only (not outsourcing to Python or another easier language.)
Tested on https://rextester.com/l/bash_online_compiler
#!/bin/bash
# GNU bash, version 4.4.20
comp_line="foo bar apple banana q xy"
comp_point=19
cur=""
curchar=${comp_line:$comp_point:1}
prevchar=${comp_line:$((comp_point - 1)):1}
nextchar=${comp_line:$((comp_point + 1)):1}
echo "<$prevchar> <$curchar> <$nextchar>"
if [[ $curchar =~ [[:graph:]] ]]; then
# Rule 1 - Extract current word
slice="${comp_line:$comp_point}"
endslice="${slice%% *}"
slice="${slice#"$endslice"}"
slice="${comp_line%"$slice"}"
cur="${slice##* }"
else
if [[ $prevchar =~ [[:graph:]] ]]; then
# Rule 2 - Extract previous word
slice="${comp_line::$comp_point}"
cur="${slice##* }"
else
if [[ $nextchar =~ [[:graph:]] ]]; then
# Rule 3 - Extract next word
slice="${comp_line:$comp_point+1}"
cur="${slice%% *}"
else
# Rule 4 - Set cur to empty string ""
cur=""
fi
fi
fi
echo "Cur: <$cur>"
The current example will return 'banana' as comp_point
is set to 19.
I'm sure that there must be a neater way to do it that I hadn't thought of, or some trick that I've missed. Also it works so far, but I think there may be some edge cases I hadn't thought of. Can anyone advise if there's a better way to do it?
(The XY problem, if anyone asks)
I'm writing a tab completion script, and trying to emulate the functionality of COMP_WORDS and COMP_CWORD, using COMP_LINE and COMP_POINT. When a user is typing a command to tab complete, I want to work out which word they are trying to tab complete just based on the latter two variables. I don't want to outsource this code to Python because performance takes a big hit when Python is involved in tab complete.
Upvotes: 0
Views: 253
Reputation: 2471
Another way in bash without array.
#!/bin/bash
string="foo bar apple banana q xy"
wordAtIndex() {
local index=$1 string=$2 ret='' last first
if [ "${string:index:1}" != " " ] ; then
last="${string:index}"
first="${string:0:index}"
ret="${first##* }${last%% *}"
fi
echo "$ret"
}
for ((i=0; i < "${#string}"; ++i)); do
printf '%s <-- "%s"\n' "${string:i:1}" "$(wordAtIndex "$i" "$string")"
done
Upvotes: 2
Reputation: 27215
if anyone knows of a better way to do this within bash only
Use regexes. With ^.{4}
you can skip the first four letters to navigate to index 4. With [[:graph:]]*
you can match the rest of the word at that index. *
is greedy and will match as many graphical characters as possible.
wordAtIndex() {
local index=$1 string=$2 left right indexFromRight
[[ "$string" =~ ^.{$index}([[:graph:]]*) ]]
right=${BASH_REMATCH[1]}
((indexFromRight=${#string}-index-1))
[[ "$string" =~ ([[:graph:]]*).{$indexFromRight}$ ]]
left=${BASH_REMATCH[1]}
echo "$left${right:1}"
}
And here is full test for your example:
string="foo bar apple banana q xy"
for ((i=0; i < "${#string}"; ++i)); do
printf '%s <-- "%s"\n' "${string:i:1}" "$(wordAtIndex "$i" "$string")"
done
This outputs the input string vertically on the left, and on each index extracts the word that index points to on the right.
f <-- "foo"
o <-- "foo"
o <-- "foo"
<-- ""
b <-- "bar"
a <-- "bar"
r <-- "bar"
<-- ""
<-- ""
<-- ""
a <-- "apple"
p <-- "apple"
p <-- "apple"
l <-- "apple"
e <-- "apple"
<-- ""
<-- ""
b <-- "banana"
a <-- "banana"
n <-- "banana"
a <-- "banana"
n <-- "banana"
a <-- "banana"
<-- ""
q <-- "q"
<-- ""
x <-- "xy"
y <-- "xy"
Upvotes: 1