green diod
green diod

Reputation: 1499

zsh length of a string with possibly unicode and escape characters

Context: I want to right-justify part of my prompt. In doing so, my current approach is to compute the length of both the left and right components of it and fill in the middle component with spaces.

Problem: Coping with %G (see prompt expansion) when the string possibly contains unicode (for example git status). Possibly the actual problem is that I don't grasp it correctly. The use of %G was suggested in another thread answer about how to signal zsh that there are characters to be output, maybe the source of my confusion. The following snippet illustrates the problem:

strlen() {
    FOO=$1
    local invisible='%([BSUbfksu]|([FB]|){*})' # (1)
    LEN=${#${(S%%)FOO//$~invisible/}}
    echo $LEN
}

local blob="%{↓%G%}"
echo $blob $(strlen $blob) # (2) Unexpectedly gives 0

local blob="↓"
echo $blob $(strlen $blob) # (3) Gives the wanted output of 1 
                           # but then this result would tell us to not use %G for unicode

The strlen function comes from this tentative explanation of counting user-visible string. Unfortunately, there was no clear complete explanation for the invisible part # (1) any extra references/explanation on this would be also welcome.

Question: When should I really use %G? Or should I just ditch it as suggested by the above snippet?

Upvotes: 5

Views: 2764

Answers (2)

Roman Perepelitsa
Roman Perepelitsa

Reputation: 2688

The following function computes the length of a string in the same way it's done during prompt expansion. It handles all inputs correctly unlike other solutions.

# Usage: prompt-length TEXT [COLUMNS]
#
# If you run `print -P TEXT`, how many characters will be printed
# on the last line?
#
# Or, equivalently, if you set PROMPT=TEXT with prompt_subst
# option unset, on which column will the cursor be?
#
# The second argument specifies terminal width. Defaults to the
# real terminal width.
#
# Assumes that `%{%}` and `%G` don't lie.
#
# Examples:
#
#   prompt-length ''            => 0
#   prompt-length 'abc'         => 3
#   prompt-length $'abc\nxy'    => 2
#   prompt-length '❎'          => 2
#   prompt-length $'\t'         => 8
#   prompt-length $'\u274E'     => 2
#   prompt-length '%F{red}abc'  => 3
#   prompt-length $'%{a\b%Gb%}' => 1
#   prompt-length '%D'          => 8
#   prompt-length '%1(l..ab)'   => 2
#   prompt-length '%(!.a.)'     => 1 if root, 0 if not
function prompt-length() {
  emulate -L zsh
  local COLUMNS=${2:-$COLUMNS}
  local -i x y=${#1} m
  if (( y )); then
    while (( ${${(%):-$1%$y(l.1.0)}[-1]} )); do
      x=y
      (( y *= 2 ))
    done
    while (( y > x + 1 )); do
      (( m = x + (y - x) / 2 ))
      (( ${${(%):-$1%$m(l.x.y)}[-1]} = m ))
    done
  fi
  echo $x
}

This function comes from Powerlevel10k ZSH theme where it's used to implement multi-line right prompt and responsive current directory truncation (demo). More info: Multi-line prompt: The missing ingredient.

Upvotes: 1

Adaephon
Adaephon

Reputation: 18329

Short answer:

You do not have to take any additional steps when using Unicode characters instead of plain ASCII. Current versions of zsh fully support Unicode characters and can handle them correctly. So even if a character is encoded by multiple bytes, zsh will still know that it is only a single character.


When to use %{...%} and %G

%{...%} is used to indicate to zsh that the string inside does not change the cursor position. This is for example useful, if you want to add escape sequences as used for setting colors:

print -P '%{\e[31m%}terminal red%{\e[0m%}'
print -P '%{\e[38;2;0;127;255m%}#007FFF%{\e[0m%}'

Without %{...%} zsh would have to assume that each character of the escape sequence moves the cursor one position to the right.

Using %G inside %{...%} (or %1{...%}) tells zsh to assume that a single character will be output. This is for counting purposes only, it will not move the cursor on its own.

According to the ZSH Manual:

This is useful when outputting characters that otherwise cannot be correctly handled by the shell, such as the alternate character set on some terminals.

As zsh is able to handle Unicode characters, it is unnecessary there (although not necessarily wrong).


Reason for unexpected results of strlen "%{↓%G%}":

This is due to the fact that strlen really only tries to remove any null-length prompt sequences (like %B or %F{red}) instead of actually measuring the printed length of the resulting string (which is probably impossible anyway). In many cases this works well enough, but it fails spectacularly in the case of "%{↓%G%}", which is actually equivalent to "↓" in the context of zsh prompts.

Explanation:

In order to find these null-length prompt sequences, strlen matches its input to this pattern

invisible=%([BSUbfksu]|([FB]|){*})'

This also contains the the sub-pattern %{*}, which will match on %{…%}. Then

LEN=${#${(S%%)FOO//$~invisible/}}

just removes any matching substring from FOO before counting the characters.

On top of that, it does not actually handle %G in any way and just removes it together with the surrounding %{...%}.

As the whole string "%{↓%G%}" matches the pattern, it will be completely removed, resulting in the unexpected character count of 0.


BTW: This does not mean, that you should not use strlen (I have been using something derived from it for quite some time in my prompt). But you should be aware of some limitations:

  • It does not work with %G (obviously).
  • It cannot handle numeric arguments for %{...%} like %3{...%}.
  • It does also not recognize numeric arguments after % for foreground and background colors like %1F (instead of %F{1} or %F{red})
  • It cannot handle nested %{...%}, or really any } inside %{...%}. (This is for example important when intending to use %D{string} for date formatting, as the length of the format string string would have to match the length of the resulting date without using `%{...%} around it.)

Lastly, there was a bug in the original definition and it should be:

local invisible='%([BSUbfksu]|([FK]|){*})'

The second B should be a K as it is intended to match the prompt escape for background colors. (%B starts boldface mode)

Upvotes: 3

Related Questions