Mike Trpcic
Mike Trpcic

Reputation: 25659

fastest way to compare strings in python

I'm writing a script in Python that will allow the user to input a string, which will be a command that instructs the script to perform a specific action. For the sake of argument, I'll say my command list is:

lock
read
write
request
log

Now, I want the user to be able to enter the word "log" and it will peform a specific action, which is very simple. However, I would like to match partial words. So, for example, if a user enters "lo", it should match "lock", as it's higher in the list. I've tried using strncmp from libc using ctypes to accomplish this, but have yet to make heads or tails of it.

Upvotes: 8

Views: 7564

Answers (10)

SpliFF
SpliFF

Reputation: 38976

import timeit

cmds = []
for i in range(1,10000):
    cmds.append("test")

def get_cmds(user_input):
    return [c for c in cmds if c.startswith(user_input)]

if __name__=='__main__':
    t = timeit.Timer("get_cmds('te')", "from __main__ import get_cmds")
    print "%0.3f seconds" % (t.timeit(number=1))

#>>> 0.008 seconds

So basically, per my comment, you're asking how to optimise an operation that takes no measurable time or CPU. I used 10,000 commands here and the test string matches every one just to show that even under extreme circumstances you could still have hundreds of users doing this and they would never see any lag.

Upvotes: 0

gahooa
gahooa

Reputation: 137332

This is optimized at runtime like you requested... (although most likely not needed)

Here is a simple bit of code which will take an input dictionary of command mapped to function, and results in an output dictionary of all non-duplicate sub commands mapped to the same function.

So you run this when you start your service, and then you have 100% optimized lookups. I am sure there is a more clever way to do this, so feel free to edit.

commands = {
  'log': log_function,
  'exit': exit_function,
  'foo': foo_function,
  'line': line_function,
  }

cmap = {}
kill = set()
for command in commands:
  for pos in range(len(1,command)):
    subcommand = command[0:pos]
    if subcommand in cmap:
      kill.add(subcommand)
      del(cmap[subcommand])
    if subcommand not in kill:
      cmap[subcommand] = commands[command]

#cmap now is the following - notice the duplicate prefixes removed?
{
  'lo': log_function,
  'log': log_function,
  'e': exit_function,
  'ex': exit_function,
  'exi': exit_function,
  'exit': exit_function,
  'f' : foo_function,
  'fo' : foo_function,
  'foo' : foo_function,
  'li' : line_function,
  'lin' : line_function,
  'line' : line_function,
}

Upvotes: 3

Bear
Bear

Reputation: 1127

Replace with your favorite string compare function. Fairly fast, and to the point.

matches = ( x for x in list if x[:len(stringToSearchFor)] == stringToSearchFor )
print matches[0]

Upvotes: 0

John Machin
John Machin

Reputation: 82934

This will do what you want:

def select_command(commands, user_input):
    user_input = user_input.strip().lower()
    for command in commands:
        if command.startswith(user_input):
            return command
    return None

However:

You seem overworried about the wrong thing. So 50 users means 50 milliseconds -- you're not going to be run out of town for that kind of "lag". Worry about inefficient database access or problems caused by users typing "r" and getting "read" when they thought they'd get "request". Minimising user keystrokes at the risk of errors is so 1960s that it's not funny. What are they using? ASR33 teletypes? At the very least you could insist on a unique match -- "rea" for read and "req" for request.

Upvotes: 5

doug
doug

Reputation: 70048

If i understand your Q correctly, you want a snippet that will return the answer as soon as it has it, without traversing further through your 'command list.' This should do what you want:

from itertools import ifilter

def check_input(some_string, code_book) :
    for q in ifilter(code_book.__contains__, some_string) :
        return True
    return False

Upvotes: 0

Peter Hansen
Peter Hansen

Reputation: 22057

This is adapted from J.Tauber's Trie implementation in Python, which you could compare and/or re-adapt with whatever extra features you need. See also the Wikipedia entry on tries.

class Trie:
    def __init__(self):
        self.root = [None, {}]

    def add(self, key):
        curr_node = self.root
        for ch in key:
            curr_node = curr_node[1].setdefault(ch, [key, {}])
        curr_node[0] = key

    def find(self, key):
        curr_node = self.root
        for ch in key:
            try:
                curr_node = curr_node[1][ch]
            except KeyError:
                return None
        return curr_node[0]

Setup (order of addition matters!):

t = Trie()
for word in [
   'lock',
   'read',
   'write',
   'request',
   'log']:
   t.add(word)

Then call like this:

>>> t.find('lo')
'lock'
>>> t.find('log')
'log'
>>> t.find('req')
'request'
>>> t.find('requiem')
>>>

Upvotes: 0

Michael Anderson
Michael Anderson

Reputation: 73490

I suggest you look at using the readline python library, rather than reinventing the wheel. The user will have to hit tab to complete the word, but you can set readline up so that tab matches as far as possible or cycles through all words starting wit the current stub.

This seems to be a fairly decent introduction to readline in python http://www.doughellmann.com/PyMOTW/readline/index.html

Upvotes: 2

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798746

jaro_winkler() in python-Levenshtein might be what you're looking for.

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342433

you can use startswith

eg

myword = "lock"
if myword.startswith("lo"):
   print "ok"

or if you want to find "lo" in the word, regardless of position, just use the "in" operator

if "lo" in myword

therefore, one way you can do this:

for cmd in ["lock","read","write","request","log"]:
    if cmd.startswith(userinput):
        print cmd
        break

Upvotes: 2

Ned Batchelder
Ned Batchelder

Reputation: 375624

If you are accepting input from a user, then why are you worried about the speed of comparison? Even the slowest technique will be far faster than the user can perceive. Use the simplest most understandable code you can, and leave efficiency concerns for tight inner loops.

cmds = [
    "lock",
    "read",
    "write",
    "request",
    "log",
    ]

def match_cmd(s):
    matched = [c for c in cmds if c.startswith(s)]
    if matched:
        return matched[0]

Upvotes: 16

Related Questions