Finding abbreviations in a string in python

Question

let's assume we have a few possible character combinations:

mystr = 'NRWTD'
my2str = RAWBC'

Now all I know is this:

vdCacheType = {'AWB' : 'Always WriteBack', 'WB': 'Write Back',
               'NR': 'No Read Ahead', 'Ra': 'Read Ahead Adaptive',
               'WT': 'Write Through',  'R' : 'Read Ahead Always',
               'D': 'Direct IO', 'C': 'Cached' }

As you can see the string is a combination of well abbreviated Character/s. My question is how can I take a string, and check if character combinations can be found in the dictionary.

I already tried to:

for x in vdCacheType:
    if x in mystr:
        print x # Here i would save the found abbr. in a list for later use
        mystr = mystr.strip(x)

The problem is that for NRWTD it finds:

Found Char:  R
New String:  NRWTD
Found Char:  WT
New String:  NRWTD
Found Char:  NR
New String:  WTD
Found Char:  D
New String:  WT

My Intention is to Return:

No Read Ahead, Write Through, Direct

instead of NRWTD Any Help is appreciated if there is a better approach to this problem I'm open. Thanks anyway!

Jon Clements · Accepted Answer

Find the longest possible substrings along the lines of:

vdCacheType = {'AWB' : 'Always WriteBack', 'WB': 'Write Back',
               'NR': 'No Read Ahead', 'Ra': 'Read Ahead Adaptive',
               'WT': 'Write Through',  'R' : 'Read Ahead Always',
               'D': 'Direct IO', 'C': 'Cached' }

import re
rx = re.compile('|'.join(sorted(vdCacheType, key=len, reverse=True)))
print ', '.join([vdCacheType[m] for m in rx.findall('NRWTD')])
# No Read Ahead, Write Through, Direct IO

And RAWBC comes out as: Read Ahead Always, Always WriteBack, Cached

Tweak according to case sensitivity and whether the entire text should be a complete an acronym (or series of).

Finding abbreviations in a string in python

Answers (2)

Related Questions