Searching items of large list in large python dictionary quickly

Question

I am currently working to make a dictionary with a tuple of names as keys and a float as the value of the form {(nameA, nameB) : datavalue, (nameB, nameC) : datavalue ,...}

The values data is from a matrix I have made into a pandas DataFrame with the names as both the index and column labels. I have created an ordered list of the keys for my final dictionary called keys with the function createDictionaryKeys(). The issue I have is that not all the names from this list appear in my data matrix. I want to only include the names do appear in the data matrix in my final dictionary.

How can I do this search avoiding the slow linear for loop? I created a dictionary that has the name as key and a value of 1 if it should be included and 0 otherwise as well. It has the form {nameA : 1, nameB: 0, ... } and is called allow_dict. I was hoping to use this to do some sort of hash search.

def createDictionary( keynamefile, seperator, datamatrix, matrixsep):
    import pandas as pd

    keys = createDictionaryKeys(keynamefile, seperator)
    final_dict = {}

    data_df = pd.read_csv(open(datamatrix), sep = matrixsep)    
    pd.set_option("display.max_rows", len(data_df))

    df_indices = list(data_df.index.values)
    df_cols = list(data_df.columns.values)[1:]
    for i in df_indices:
        data_df = data_df.rename(index = {i:df_cols[i]})
    data_df = data_df.drop("Unnamed: 0", 1) 

    allow_dict = descriminatePromoters( HARDCODEDFILENAME, SEP, THRESHOLD )

    #print ( item for item in df_cols if allow_dict[item] == 0 ).next()

    present = [ x for x in keys if x[0] in df_cols and x[1] in df_cols]
    for i in present:
        final_dict[i] = final_df.loc[i[0],i[1]]

    return final_dict

Searching items of large list in large python dictionary quickly

Answers (1)

Related Questions