wagner-felix
wagner-felix

Reputation: 875

Finding abbreviations in a string in python

let's assume we have a few possible character combinations:

mystr = 'NRWTD'
my2str = RAWBC'

Now all I know is this:

vdCacheType = {'AWB' : 'Always WriteBack', 'WB': 'Write Back',
               'NR': 'No Read Ahead', 'Ra': 'Read Ahead Adaptive',
               'WT': 'Write Through',  'R' : 'Read Ahead Always',
               'D': 'Direct IO', 'C': 'Cached' }

As you can see the string is a combination of well abbreviated Character/s. My question is how can I take a string, and check if character combinations can be found in the dictionary.

I already tried to:

for x in vdCacheType:
    if x in mystr:
        print x # Here i would save the found abbr. in a list for later use
        mystr = mystr.strip(x)

The problem is that for NRWTD it finds:

Found Char:  R
New String:  NRWTD
Found Char:  WT
New String:  NRWTD
Found Char:  NR
New String:  WTD
Found Char:  D
New String:  WT

My Intention is to Return:

No Read Ahead, Write Through, Direct

instead of NRWTD Any Help is appreciated if there is a better approach to this problem I'm open. Thanks anyway!

Upvotes: 3

Views: 3840

Answers (2)

Felix Castor
Felix Castor

Reputation: 1675

Jon Clemens's solution is correct but here is another solution.

I had to make a separate list of the keys to preserve the order. If I used vdCacheType.keys() to list through they came out in this order: ['R', 'C', 'WT', 'WB', 'NR', 'AWB', 'D', 'RA'] which won't work.

str.strip() won't work in this case because the strings have no white space between them.

vdCacheType = {'AWB' : 'Always WriteBack', 'WB': 'Write Back',
           'NR': 'No Read Ahead', 'RA': 'Read Ahead Adaptive',
           'WT': 'Write Through',  'R' : 'Read Ahead Always',
           'D': 'Direct IO', 'C': 'Cached' }

vdCacheKeys = ['AWB','WB','NR','RA','WT','R','D','C']

mystr = 'NRWTD'
my2str = 'RAWBC'

listAbbr = []
result = ''
index = 0 


print vdCacheType.keys()
for x in vdCacheKeys:
    if x in mystr:
        listAbbr.append(x)
        index = mystr.find(x)
        mystr = mystr[:index]+' ' + mystr[index +len(x):]
        print mystr
        result+=vdCacheType[x]  + ', '
    # print x # Here i would save the found abbr. in a list for later use
print result

Output No Read Ahead, Write Through, Direct IO,

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142146

Find the longest possible substrings along the lines of:

vdCacheType = {'AWB' : 'Always WriteBack', 'WB': 'Write Back',
               'NR': 'No Read Ahead', 'Ra': 'Read Ahead Adaptive',
               'WT': 'Write Through',  'R' : 'Read Ahead Always',
               'D': 'Direct IO', 'C': 'Cached' }

import re
rx = re.compile('|'.join(sorted(vdCacheType, key=len, reverse=True)))
print ', '.join([vdCacheType[m] for m in rx.findall('NRWTD')])
# No Read Ahead, Write Through, Direct IO

And RAWBC comes out as: Read Ahead Always, Always WriteBack, Cached

Tweak according to case sensitivity and whether the entire text should be a complete an acronym (or series of).

Upvotes: 5

Related Questions