Reputation: 542
I have a list of strings that is already sorted in case-insensitive order. I would like to insert a new string into the list. One way to do this is to append the item and then sort the list, like so:
myList.append('Something')
myList.sort(key=lambda s: s.lower())
But I was wondering if there is a way to just insert the item in the correct position without sorting the whole thing again.
I found this question: Insert an item into a sorted list in Python. It points towards Python's bisect module. But that module doesn't look like it can support case-insensitivity.
Edit: I tested out several of the answers listed here.
lower()
on every item on the list.lower()
on every element.It was a close call to accept an answer. In the end, I went with Stefan Pochmann's answer because it was the best for a one-time insertion, and accessing the resulting list does not require accessing a member variable. However, use cases vary, so be sure to examine all the answers.
Upvotes: 6
Views: 2164
Reputation: 28636
I see you added test results to you question. I did some benchmarks as well now and get a similar picture:
Insorting 20000 words:
80.224 seconds with insort_sorting
0.166 seconds with insort_own_binary_search
70.294 seconds with insort_lower_all
0.094 seconds with insort_keep_lower
You're somewhat wrong about the fast two, though. With larger numbers of insertions, mine becomes faster. About twice as fast:
Insorting 1000000 words:
92.712 seconds with insort_own_binary_search
173.577 seconds with insort_keep_lower
That's because the O(log n) time for searching the index becomes negligible and the time is dominated by the O(n) time for the insert
calls. And my solution has only one of those while the other solution has two.
Another difference is space complexity, keeping an extra list of lowered versions of all strings isn't as good.
Here's my benchmarking code:
import random, string, time
#--------------------------------------------------------------
def insort_sorting(a, x):
a.append(x)
a.sort(key=str.lower)
#--------------------------------------------------------------
def insort_own_binary_search(a, x):
key = x.lower()
lo, hi = 0, len(myList)
while lo < hi:
mid = (lo + hi) // 2
if key < a[mid].lower():
hi = mid
else:
lo = mid + 1
a.insert(lo, x)
#--------------------------------------------------------------
from bisect import bisect
def insort_lower_all(a, x):
index = bisect([i.lower() for i in a], x.lower())
a.insert(index, x)
#--------------------------------------------------------------
from bisect import bisect
def insort_keep_lower(a, x, lower=[]):
x_lower = x.lower()
index = bisect(lower, x_lower)
a.insert(index, x)
lower.insert(index, x_lower)
#--------------------------------------------------------------
# Generate random words
words = [''.join(random.choice(string.ascii_letters) for _ in range(10))
for _ in range(20000)]
# for _ in range(1000000)]
# Compare the solutions
print 'Insorting', len(words), 'words:'
reference = None
for insort in insort_sorting, insort_own_binary_search, insort_lower_all, insort_keep_lower:
#for insort in insort_own_binary_search, insort_keep_lower:
t0 = time.time()
myList = []
for word in words:
insort(myList, word)
print '%7.3f seconds with %s' % (time.time() - t0, insort.__name__)
if reference is None:
reference = myList
else:
assert myList == reference
Upvotes: 1
Reputation: 28636
It's a good opportunity to practice binary search again (or just copy&paste&modify bisect.insort
, which is what I did):
def insort_case_insensitive(a, x):
key = x.lower()
lo, hi = 0, len(myList)
while lo < hi:
mid = (lo + hi) // 2
if key < a[mid].lower():
hi = mid
else:
lo = mid + 1
a.insert(lo, x)
Demo:
myList = ['a', 'b', 'c', 'd', 'e']
for x in 'A', 'B', 'C', 'D', 'E':
insort_case_insensitive(myList, x)
print myList
Prints:
['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E']
It's O(n), just like append+sort would be, but only because of the a.insert(lo, x)
at the end. Which is dead-simple and done in C, so it's super fast. The binary search of course only takes O(log n) steps, so that's super fast as well. The append+sort way would call .lower()
on all elements and compare them, both of which is much slower. @MoinuddinQuadri's first solution is also much slower because of calling .lower()
on all elements.
See my other answer for a benchmarking comparison.
Upvotes: 5
Reputation: 9008
You could create your own type to encapsulate this behavior (combined with bisect
as suggested in another answer).
from bisect import bisect
class CaseInsensitiveSortedList:
def __init__(self, iterable):
self.with_case = list(sorted(iterable, key=lambda s: s.lower()))
self.without_case = [s.lower() for s in self.with_case]
def insert_in_order(self, s):
s_lower = s.lower()
index = bisect(self.without_case, s_lower)
self.without_case.insert(index, s_lower)
self.with_case.insert(index, s)
test_list = CaseInsensitiveSortedList(['a', 'B', 'cd', 'E', 'fff'])
test_list.insert_in_order('D')
print(test_list.with_case) # ['a', 'B', 'cd', 'D', 'E', 'fff']
test_list.insert_in_order('ee')
print(test_list.with_case) # ['a', 'B', 'cd', 'D', 'E', 'ee', 'fff']
You could extend list
directly and make this a little "more natural" or do whatever else you want with it. This is just an idea to avoid calling str.lower
on every element for every insert.
Upvotes: 3
Reputation: 48100
Based on comment from OP, since it is OK to maintain the intermediate list to hold lower-cased strings (one time operation); it could be achieved as:
from bisect import bisect
my_list = ["aa", "bb", "Dd", "ee"]
lower_list = [i.lower() for i in my_list] # list of lower-cased strings.
# one time operation
insert_string = "CC" # word to insert
# find index based on lower-cased list
index = bisect(lower_list, insert_string.lower())
my_list.insert(index, insert_string) # insert word in original list
lower_list.insert(index, insert_string.lower()) # insert lower-cased word
# in lower-cased list
where final value of my_list
will be lower_list
will be:
>>> my_list # original list
['aa', 'bb', 'CC', 'Dd', 'ee']
>>> lower_list
['aa', 'bb', 'cc', 'dd', 'ee']
Here we are bisecting the list of lower-cased words to find the index and based on the index, inserting the string into original list.
Upvotes: 1
Reputation: 14236
I have never worked with bisect but here is my stab at it. The first function I take directly from the bisect
page that you linked:
def index(a, x):
'Locate the leftmost value exactly equal to x'
i = bisect_left(a, x)
if i != len(a) and a[i] == x:
return i
raise ValueError
def insert_into_sorted(char, my_list):
marker = chr(ord(char) + 1)
spot = index(my_list, marker)
my_list[spot:spot] = char
return my_list
x = ['a', 'b', 'd', 'e']
insert_into_sorted('c', x)
>>['a', 'b', 'c', 'd', 'e']
Upvotes: 1
Reputation: 48100
You may use bisect.bisect
on the lower-cased sorted list as:
from bisect import bisect
my_list = ["aa", "bb", "Dd", "ee"]
insert_string = "CC"
# convert all the items in list to lower case for
# v finding the correct location via. bisect
index = bisect([i.lower() for i in my_list], insert_string.lower())
# bisect based on lower-cased string for ^
# case-insensitive behavior
my_list.insert(index, insert_string)
where updated content of my_list
will be:
['aa', 'bb', 'CC', 'Dd', 'ee']
Upvotes: 3