rshar
rshar

Reputation: 1477

Check if a value in a dictionary is a substring of another key-value pair in Python

I have a dictionary disease_dict with values in a list element. I would like to fetch key and value for specific keys and then check if the value (as a substring) exists in other keys and fetch all the key --> value pair.

For example this is the dictionary. I would like to see if the 'Stroke' or 'stroke' exist in the dictionary and then match if the value of this key is a substring of other value (like 'C10.228.140.300.775' exists in 'C10.228.140.300.275.800', 'C10.228.140.300.775.600')

'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'], 'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600']

I have the following lines of code for fetching the key and value for a specific term.

#extract all child terms
for k, v in dis_dict.items():
    if (k in ['Glaucoma', 'Stroke']) or (k in ['glaucoma', 'stroke']):
        disease = k
        tree_id = v
        print (disease, tree_id)
    else:
        disease = ''
        tree_id = ''
        continue

Any help is highly appreciated!

Upvotes: 0

Views: 97

Answers (3)

Joooeey
Joooeey

Reputation: 3866

You don't need a huge amount of code for this.

The main thing to know is that you can find a substring with in. E.g. "abc" in "abcdef" == True.

  1. See if there are keys with a substring match: if k1.lower() in k2.lower() (I used .lower() here for case-insensitive comparison. Not sure if that's required.)
  2. If that's true, go through both lists and see if you can find a match with in (if search_string in find_string). That's the function print_match.
dis_dict = {
    'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'],
    'Stroke, Lacunar': [
        'C10.228.140.300.275.800',
        'C10.228.140.300.775.600',
        'C14.907.253.329.800',
        'C14.907.253.855.600'
    ]
}

def print_match(v1, v2):
    for search_string in v1:
        for find_string in v2:
            if search_string in find_string:
                print(f"{k1}: {v1} found in {k2}: {v2}")
                return

for k1, v1 in dis_dict.items():
    if k1.lower() in ["glaucoma", "stroke"]:
        for k2, v2 in dis_dict.items():
            if k1 is k2:
                continue
            if k1.lower() in k2.lower():
                print_match(v1, v2)

Upvotes: 0

Forague
Forague

Reputation: 169

You have a good starting point and as you probably already know, you need to work on the key to split it. Here is how you could do it:

disease_dict = { 'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'], 'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600'], 'Flue' : ['C10.228.140.300.780'] } 

for k, v in disease_dict.items():
    tmp = ''.join(x for x in k if x.isalpha() or x == '-' or x == ' ')
    tmpKey = tmp.split(' ')
    for tk in tmpKey:
        if tk.capitalize() in ['Stroke', 'Glaucoma']:
            print(k, v, end= ' ') # To remove the new line ending

First, we remove unnecessary characters by using this line :

tmp = ''.join(x for x in k if x.isalpha() or x == ' ' or x == '-')

It only keeps the alpha characters, spaces, and dashes. Since I don't know what your diseases look like, I only kept those characters (space is needed on the next line). After creating this new formatted key, we split it by spaces to then compare substrings.

tmpKey = tmp.split(' ')

Once tmpKey is made, we loop over it to check if your wanted disease belongs to the original key.

for tk in tmpKey:
    if tk.capitalize() in ['Stroke', 'Glaucoma']:
        print(k, v, end= ' ') # To remove the new line ending

tk.capitalize() is used to capitalize the first letter so you don't have to check both forms of a word.

Finally, after running the above script, here is what we got:

Stroke ['C10.228.140.300.775', 'C14.907.253.855'] Stroke, Lacunar ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600'] 

Upvotes: 1

user7711283
user7711283

Reputation:

The code below should do what you want to achieve:

dis_dict = {
    'Stroke':          ['C10.228.140.300.775', 'C14.907.253.855'], 
    'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855']
}

dict_already_printed = {}
for k, v in dis_dict.items():
    if ( k.lower() in ['glaucoma', 'stroke'] ):
        disease = k
        tree_id = v
        output = None
        for c_code_1 in tree_id:
            for key, value in dis_dict.items():  
                for c_code_2 in value: 
                    if c_code_1 in c_code_2: 
                        if f'{disease} {tree_id}' != f'{key} {value}':
                            tmp_output = f'{disease} {tree_id}, other: {key} {value}'
                            if tmp_output not in dict_already_printed:
                                output = tmp_output
                                print(output)
                                dict_already_printed[output] = None
        if output is None: 
            output = f'{disease} {tree_id}'
            print(output)

    else:
        disease = ''
        tree_id = ''
        continue

so test it with another data for the dictionary to see if it works as expected. It prints only in case of complete match:

Stroke ['C10.228.140.300.775', 'C14.907.253.855'], other: Stroke, Lacunar ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855']

or if no other disease was found (with dictionary values changed to avoid a match) only the found one:

Stroke ['C10.228.140.300.775', 'C14.907.253.855']

Upvotes: 1

Related Questions