Reputation: 159
In my datasets, I have the ID, gender, village and crop area in hectare from different farmers.
I have to create a group of 5 hectares farmers for one group using crop area. In each group four farmers will be selected randomly, but minimum 25% of women farmers crop area has to be selected in each group randomly.
I am trying to find out how, but I am stuck in getting the correct solution.
Here is my dummy data set:
Farmer_id Gender Village Crop_area
0 1 F Nashik 1.00
1 2 F Nashik 0.50
2 3 M Nashik 1.00
3 4 M Nashik 0.80
4 5 M Nashik 0.60
5 6 M Nashik 0.10
6 7 M Nashik 1.00
7 8 F Nashik 0.60
8 9 F Nashik 1.00
9 10 F Nashik 0.29
10 11 M Nashik 0.70
11 12 M Nashik 1.00
12 13 M Nashik 0.41
13 14 M Nashik 1.00
Here is what I have so far:
df['Crop_Area_Cum'] = df['Crop_area'].cumsum()
grouped = df.groupby(df.Gender)
df_male = grouped.get_group("M")
df_female = grouped.get_group("F")
df['Sample']=4
df['Selected Farmers'] = df['Sample'].apply(np.ceil).astype(int)
df['Selected Farmers'] = df.groupby('Gender').apply(lambda df: df['Village'].sample(df['Selected Farmers'].iat[0])).reset_index(level=0)['Village']
df['Selected Farmers'] = df['Selected Farmers'].fillna('')
Farmer_id Gender ... Sample Selected Farmers
0 1 F ... 4 Nashik
1 2 F ... 4 Nashik
2 3 M ... 4
3 4 M ... 4
4 5 M ... 4
5 6 M ... 4 Nashik
6 7 M ... 4 Nashik
7 8 F ... 4 Nashik
8 9 F ... 4 Nashik
9 10 F ... 4
10 11 M ... 4
11 12 M ... 4
12 13 M ... 4 Nashik
13 14 M ... 4 Nashik
The output is not correct, because none of the criteria is followed for sampling.
Upvotes: 1
Views: 578
Reputation: 3720
The first idea is to assign a random group number to each row. then sum that up to see if each area is close to 5. Keep doing that until that condition is true. Then random selection flags are assigned to each row such that each group has 4 Trues and the rest Falses.
Then next step is to keep assigning those random selection flags until the female(s) selected represent at least 25% of the total female land ownership. Keep running the code until a solution is achieved.
nb_groups = 2
min_acceptable_female_prp = 0.25
The random selector function:
def rnd_sel(x):
arr = [True]*4+[False]*(len(x)-4)
np.random.shuffle(arr)
return arr
The main processing loop:
for i in range(100):
dfg = df.assign(Group=random.choices(range(1,nb_groups+1), k=len(df)))
if (dfg.groupby('Group').sum()['Crop_area']-5).abs().max() < 0.5:
dfg.sort_values(['Group','Gender'], inplace=True)
print(f'\nGroup area solve iteration: {i}\n')
dfg_s = dfg.groupby(['Group','Gender']).sum()
dfg_s['Tot_Grp_Crop_area'] = dfg_s.groupby('Group')['Crop_area'].transform(sum)
print(dfg_s)
# make sure females are present in each group
if len(dfg_s.loc[pd.IndexSlice[:, 'F'], :]) == nb_groups:
dfg['Selected'] = dfg.groupby('Group')['Group'].transform(rnd_sel)
print()
print(dfg)
dfg_sf = dfg[dfg['Selected']].groupby(['Group','Gender']).sum()
print(dfg_sf)
if len(dfg_sf.loc[pd.IndexSlice[:, 'F'], :]) == nb_groups:
dfg_s['Gender_Selected_area'] = dfg_sf['Crop_area']
dfg_s['Gender_Selected_area_prp'] = dfg_s['Gender_Selected_area']/dfg_s['Crop_area']
print(dfg_s)
min_female_prp = dfg_s.loc[pd.IndexSlice[:, 'F'], :]['Gender_Selected_area_prp'].min()
if min_female_prp >= min_acceptable_female_prp:
print(f'\nSolution achieved with minimum female crop area representation of {min_female_prp*100:.1f}%')
else:
print('*** solution not achieved')
break
Associated output:
Group area solve iteration: 0
Farmer_id Crop_area Tot_Grp_Crop_area
Group Gender
1 F 11 2.10 5.41
M 38 3.31 5.41
2 F 19 1.29 4.59
M 37 3.30 4.59
Farmer_id Gender Village Crop_area Group Selected
0 1 F Nashik 1.00 1 False
1 2 F Nashik 0.50 1 True
7 8 F Nashik 0.60 1 True
2 3 M Nashik 1.00 1 True
3 4 M Nashik 0.80 1 True
5 6 M Nashik 0.10 1 False
11 12 M Nashik 1.00 1 False
12 13 M Nashik 0.41 1 False
8 9 F Nashik 1.00 2 True
9 10 F Nashik 0.29 2 False
4 5 M Nashik 0.60 2 True
6 7 M Nashik 1.00 2 False
10 11 M Nashik 0.70 2 True
13 14 M Nashik 1.00 2 True
Farmer_id Crop_area Selected
Group Gender
1 F 10 1.1 2
M 7 1.8 2
2 F 9 1.0 1
M 30 2.3 3
Farmer_id Crop_area Tot_Grp_Crop_area Gender_Selected_area \
Group Gender
1 F 11 2.10 5.41 1.1
M 38 3.31 5.41 1.8
2 F 19 1.29 4.59 1.0
M 37 3.30 4.59 2.3
Gender_Selected_area_prp
Group Gender
1 F 0.523810
M 0.543807
2 F 0.775194
M 0.696970
Solution achieved with minimum female crop area representation of 52.4%
Upvotes: 1
Reputation: 20505
There's more than one way to tackle this.
Rejection Sampling is perhaps the simplest.
Your new function wishes to return a suitable 4-tuple of farmers.
set()
.return
the four, or start from scratch at step (1.)Upvotes: 2