Subset a dictionary of dataframes through iteration

Question

I am creating a program to ingest a series of text files from 2001Q1 through 2016Q1 based upon name qualifiers which indicate the schedule/report type. The qualifiers are referred to as keys (for lack of a better name)

keys=[' RI ','RCD','RCF','RCG','RCH','RCL','RCO','RCRII']

given a path C:\files, I create a dictionary of all eligible text files

files=[]
for k in keys:
    for i in os.listdir(path):
        if os.path.isfile(os.path.join(path,i)) and k in i:
            files.append(i)

Then I create a dictionary

    df_dict={file[:-4].replace(" ","_"):pd.read_table(path+file,header=[0,1],index_col=0,error_bad_lines=False,dtype={'IDRSSD':object}, low_memory=False) for file in files}

The sample dictionary looks like: {(Schedule_RI_2001Q1:Col1 Col2 ColN), (Schedule_RCO_2001Q1:Col1 Col2 ColN), (Schedule_RI_2005Q2: Col1 Col2 ColN) }

in a key-value arrangement.

I need to create dictionaries from the main dictionary based on report type. I came up with this script (I know its amateur):

for key in keys:
    for k in df_dict.keys():
        for v in df_dict.values():
            if key in k:
                key.strip={k:v}

Regardless of using key.strip or key.strip() I receive an error message, "'str' object attribute 'strip' is read-only" or "can't assign to function call", respectively. Is there a better way to accomplish this tasks. The reason I created the aggregate dictionary is to do some data formatting and etc. Assistance in breaking out the dictionary would be greatly appreciated.

oxalorg · Accepted Answer

You can't directly create a dictionary on key.strip nor key.strip(), because well they are functions. You can however create a temporary dictionary, and use the value returned by those functions as a key in the temporary dictionary.

This is a relatively safer method:

keys = ['a', 'b']
df_dict = { 'a_2010': 1, 'a_2007': 2, 'Schedule_b_2009Q1': 3 }

for key in keys:
    sub_dict[key.strip()] = {}
    for k, v in df_dict.items():
        if key in k:
            sub_dict[key.strip()][k] = v

Output:

>>> sub_dict
{'a': {'a_2007': 2, 'a_2010': 1},
 'b': {'Schedule_b_2009Q1': 3}}

If the above seems unecessarily complex, you can simply use locals() to solve this particular problem (but it's usually not a good practice to use it everywhere):

keys = ['a', 'b', 'c']
df_dict = { 'a_2010': 1, 'a_2007': 2, 'Schedule_b_2009Q1': 3 }

for key in keys:
    locals()[key.strip()] = {}
    for k, v in df_dict.items():
        if key in k:
            locals()[key.strip()][k] = v

Output:

>>> a
{'a_2007': 2, 'a_2010': 1}
>>> b
{'Schedule_b_2009Q1': 3}

Subset a dictionary of dataframes through iteration

Answers (1)

Related Questions