Reputation: 157
I have a dictionary like this:
dic= {'AGS': array([1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64),
'CM': array([1, 1, 2, 2], dtype=int64),
'COA': array([1, 1, 1, 2, 2, 3, 3], dtype=int64),
'COL': array([1, 2], dtype=int64)}
And a dataframe like this:
c = pd.DataFrame(data={'CTY':['AGS', 'AGS', 'AGS', 'AGS', 'AGS', 'AGS',
'AGS', 'AGS', 'AGS', 'CM', 'CM', 'CM',
'CM', 'COA', 'COA', 'COA', 'COA', 'COA',
'COA', 'COA', 'COL', 'COL'],
'DIST':[1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 1,
1, 1, 2, 2, 3, 3, 1, 2],
'FA':[350, 320, 350, 360, 380, 380, 480,480,488,
780, 320, 310, 250, 141, 564, 564, 437, 438,
287, 287, 560, 560],
'ID':[1, 1, 2, 2, 3, 1, 2, 3, 1, 1, 1, 2, 2, 2,
3, 3, 1, 2, 3, 3, 1, 2],
'LTT':['A', 'A', 'B', 'B', 'B', 'C',
'C', 'C', 'B', 'C', 'A', 'E',
'S', 'B', 'B', 'C', 'C', 'A',
'C', 'A', 'E', 'S']
})
What I have in mind is maybe use the dictionary to iterate over the dataframe in some way and then generate what I really need that is a sample that depend on the filtered CTY
and DIST
columns as I write below and then do I concat of those results:
df1=c[(df['CTY'] == 'AGS') &
(df['DIST'] == 1)].sample(n=2)
df2=c[(df['CTY'] == 'AGS') &
(df['DIST'] == 2)].sample(n=2)
df3=c[(df['CTY'] == 'AGS') &
(df['DIST'] == 3)].sample(n=2)
df4=c[(df['CTY'] == 'CM') &
(df['DIST'] == 1)].sample(n=2)
df5=c[(df['CTY'] == 'CM') &
(df['DIST'] == 2)].sample(n=2)
.
.
.
dfn=c[(df['CTY'] == 'COL') &
(df['DIST'] == 2)].sample(n=2)
pd.concat([df1,df2,df3, df4, df5, dfn])
What ideas do you have?. I would like to get the fastest output as possible, I would appreciate it. Maybe with a list comprehesion
Upvotes: 1
Views: 62
Reputation: 75150
IIUC, you may need a dictionary comprehension:
d = {key: c[c['CTY'].eq(key) & c['DIST'].isin(set(val))].sample(n=2)
for key,val in dic.items()}
Post this you can access each key to access the dataframes:
Example:
print(d['AGS'])
CTY DIST FA ID LTT
3 AGS 2 360 2 B
1 AGS 1 320 1 A
For concating:
pd.concat(d) #this will keep the identifier
or
pd.concat((c[c['CTY'].eq(key) & c['DIST'].isin(set(val))].sample(n=2)
for key,val in dic.items()))
Upvotes: 1