jfalkson
jfalkson

Reputation: 3739

Split by a word (case insensitive)

If I want to take

"hi, my name is foo bar"

and split it on "foo", and have that split be case insensitive (split on any of "foO", "FOO", "Foo", etc), what should I do? Keep in mind that although I would like to have the split be case insensitive, I also DO want to maintain the case sensitivity of the rest of the string.

So if I have:

test = "hi, my name is foo bar"

print test.split('foo')

print test.upper().split("FOO")

I would get

['hi, my name is ', ' bar']
['HI, MY NAME IS ', ' BAR']

respectively.

But what I want is:

['hi, my name is ', ' bar']

every time. The goal is to maintain the case sensitivity of the original string, except for what I am splitting on.

So if my test string was:

"hI MY NAME iS FoO bar"

my desired result would be:

['hI MY NAME iS ', ' bar']

Upvotes: 22

Views: 21804

Answers (4)

ShadowRanger
ShadowRanger

Reputation: 155428

A highly unreasonable solution for when you want to split precisely once (bonus: it preserves the case of the separator found too, so you know what you actually split on):

teststr = "hI MY NAME iS FoO bar"
blen, _, alen = map(len, teststr.casefold().partition("foo"))
before, sep, after = teststr[:blen], teststr[blen:-alen], teststr[-alen:]
print([before, sep, after])

which outputs (including the original separator from the string, which you can discard if you like):

['hI MY NAME iS ', 'FoO', ' bar']

Try it online!

Similar code could be written with teststr.casefold().index("foo") plus slices based on the index and the length of the separator; I like partition doing more of the work for me (and working branchlessly whether or not the separator appears in the string), but it's purely personal preference.

This could be adapted to splitting an arbitrary number of times (but losing the separators since str.split discards them, unlike str.partition) with itertools helpers:

from itertools import accumulate, pairwise  # On 3.10+, pairwise is a built-in, pre-3.10, you'd use the pairwise recipe from itertools docs
                                            # accumulate only accepts an initial argument
                                            # on 3.8+, but you can fake it with chain before that

teststr = "hI MY NAME iS FoO bar aNd I LOvE fOoD!"
components = [teststr[s+i*3:e+i*3] 
              for i, (s, e) in enumerate(pairwise(accumulate(map(len, teststr.casefold().split("foo")), initial=0)))]
print(components)

which produces:

['hI MY NAME iS ', ' bar aNd I LOvE ', 'D!']

Try it online!

To be clear, the first solution is unreasonable, and the second solution is insane. I post this largely for illustration; this is a case where the non-regex solutions are nuts, so even if you agree with me that regex should be avoided when possible, this is a case to bite the bullet and use them.

Upvotes: 0

Nick Legend
Nick Legend

Reputation: 1048

This is not the exact answer but the solution based on the question. After searching awhile on the net I implemented the following.

This is my custom tag (see how to do it).

from django.template.defaultfilters import register
from django.utils.html import escape
from django.utils.safestring import mark_safe

@register.simple_tag
def highlight(string, entry, prefix, suffix):
    string = escape(string)
    entry = escape(entry)
    string_upper = string.upper()
    entry_upper = entry.upper()
    result = ''
    length = len(entry)
    start = 0
    pos = string_upper.find(entry_upper, start)
    while pos >= 0:
        result += string[start:pos]
        start = pos + length
        result += prefix + string[pos:start] + suffix
        pos = string_upper.find(entry_upper, start)
    result += string[start:len(string)]
    return mark_safe(result)

It accepts unsafe string and returns the escaped result.

Use it this way:

<span class="entityCode">{% highlight entity.code search_text '<span class="highlighted">' '</span>' %}</span>

I use a nested <span> to inherit all the style. And it shows something like

enter image description here

Upvotes: 1

Christian Mosz
Christian Mosz

Reputation: 543

You can also search for something and get the startposition of the keyword. I would recommend that and cut it with "substring" method out. (I am from C# so i dont know whats the method in this language)

Upvotes: -1

user2555451
user2555451

Reputation:

You can use the re.split function with the re.IGNORECASE flag (or re.I for short):

>>> import re
>>> test = "hI MY NAME iS FoO bar"
>>> re.split("foo", test, flags=re.IGNORECASE)
['hI MY NAME iS ', ' bar']
>>>

Upvotes: 49

Related Questions