Richard21
Richard21

Reputation: 157

How can I do multiple pandas dataframes samples iterating over a dictionary?

I have a dictionary like this:

 dic= {'AGS': array([1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64),
'CM': array([1, 1, 2, 2], dtype=int64),
'COA': array([1, 1, 1, 2, 2, 3, 3], dtype=int64),
'COL': array([1, 2], dtype=int64)}

And a dataframe like this:

c = pd.DataFrame(data={'CTY':['AGS', 'AGS', 'AGS', 'AGS', 'AGS', 'AGS', 
                          'AGS', 'AGS', 'AGS', 'CM',  'CM',   'CM', 
                          'CM',  'COA', 'COA', 'COA', 'COA', 'COA', 
                          'COA', 'COA', 'COL', 'COL'],
                   'DIST':[1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 1, 
                         1, 1, 2, 2, 3, 3, 1, 2],
                   'FA':[350, 320, 350, 360, 380, 380, 480,480,488,
                         780, 320, 310, 250, 141, 564, 564, 437, 438,
                         287, 287, 560, 560],
                   'ID':[1, 1, 2, 2, 3, 1, 2, 3, 1, 1, 1, 2, 2, 2, 
                         3, 3, 1, 2, 3, 3, 1, 2],
                   'LTT':['A', 'A', 'B', 'B', 'B', 'C',
                          'C', 'C', 'B', 'C', 'A', 'E',
                          'S', 'B', 'B', 'C', 'C', 'A',
                          'C', 'A', 'E', 'S']
                   })

What I have in mind is maybe use the dictionary to iterate over the dataframe in some way and then generate what I really need that is a sample that depend on the filtered CTY and DIST columns as I write below and then do I concat of those results:

    df1=c[(df['CTY'] == 'AGS') & 
   (df['DIST'] == 1)].sample(n=2)
    df2=c[(df['CTY'] == 'AGS') & 
   (df['DIST'] == 2)].sample(n=2)
    df3=c[(df['CTY'] == 'AGS') & 
   (df['DIST'] == 3)].sample(n=2)
    df4=c[(df['CTY'] == 'CM') & 
   (df['DIST'] == 1)].sample(n=2)
    df5=c[(df['CTY'] == 'CM') & 
   (df['DIST'] == 2)].sample(n=2)
               .
               .
               .
    dfn=c[(df['CTY'] == 'COL') & 
   (df['DIST'] == 2)].sample(n=2)  

    pd.concat([df1,df2,df3, df4, df5, dfn])      

What ideas do you have?. I would like to get the fastest output as possible, I would appreciate it. Maybe with a list comprehesion

Upvotes: 1

Views: 62

Answers (1)

anky
anky

Reputation: 75150

IIUC, you may need a dictionary comprehension:

d = {key: c[c['CTY'].eq(key) & c['DIST'].isin(set(val))].sample(n=2) 
     for key,val in dic.items()}

Post this you can access each key to access the dataframes:

Example:

print(d['AGS'])

   CTY  DIST   FA  ID LTT
3  AGS     2  360   2   B
1  AGS     1  320   1   A

For concating:

pd.concat(d) #this will keep the identifier

or

pd.concat((c[c['CTY'].eq(key) & c['DIST'].isin(set(val))].sample(n=2) 
           for key,val in dic.items()))

Upvotes: 1

Related Questions