vincent-lg
vincent-lg

Reputation: 559

Handling escape characters in a string

There are strings from the user input I need to convert. The use case is pretty simple:

In theory, no big problem. I use Python, but I'm sure others with other languages will find this as easy with regular expressions.

import re

def get_lines(text):
    """Return a list of lines (list of str)."""
    command_stacking = ";"
    delimiter = re.escape(command_stacking)
    re_del = re.compile("(?<!{s}){s}(?!{s})".format(s=delimiter), re.UNICODE)
    chunks = re_del.split(text)

    # Clean the double delimiters
    for i, chunk in enumerate(chunks):
        chunks[i] = chunk.replace(2 * command_stacking, command_stacking)

    return chunks

That seems to work:

>>> get_lines("first line;second line;third line with;;a semicolon")
['first line', 'second line', 'third line with;a semicolon']
>>>

But when there's three or four semicolons, it doesn't behave as expected.

The multiple semicolons are ignored by the regular expression (as they should), but when replacing ;; by ;, ;;; is replaced by ;;, ;;;; is replaced by ;;... and so on. It would be great if 2 was replaced by 1, 3 by 2, 4 by 3... that's something I could explain to my users.

What would be the best solution to do that?

Thanks for your help,

Upvotes: 2

Views: 434

Answers (3)

wwii
wwii

Reputation: 23743

The repl argument of re.sub can be a function.

>>> s = 'a;;b;;;c;;;;d'
>>> pattern = ';{2,}'
>>> def f(m):
    return m.group(0)[1:]

>>> re.sub(pattern, f, s)
'a;b;;c;;;d'
>>> 

Upvotes: 1

nu11p01n73R
nu11p01n73R

Reputation: 26667

You can use re.split with look arounds.

Example

>>> re.split(r'(?<!;);(?!;)', string)
['first line', 'second line', 'third line with;;a semicolon']

Regex

  • (?<!;) Negative look behind. Checks if the ; is not preceded by another ;
    • ; Matches the ;
  • (?!;) Negative look ahead. Checks if the ; is not followed by another ;

>>> [x.replace(';;', ';') for x in re.split(r'(?<!;);(?!;)', string)]
['first line', 'second line', 'third line with;a semicolon']

Upvotes: 0

Batman
Batman

Reputation: 8917

Instead of the string replace method use re.sub() with count=1

import re
re.sub(';;', ';', 'foo;;;bar', count=1)

https://docs.python.org/2/library/re.html#re.sub

Upvotes: 1

Related Questions