Reputation: 2871
I have a data frame as shown below.
ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Which has only one column ID and 20 unique values. randomly, I want to pick 25% of the unique values of column ID and create a new column OWNER_ID by randomly populating that across 20 rows with 10% missing (2 rows).
The randomly picked ID and Owner_ID should match. For example if we randomly picked 2 as one of the Owner_ID. then whenever ID is 2, Owner_ID should be 2
For example randomly I picked 2,3,8,9,11
The expected output:
ID OWNERD_ID
1 2
2 2
3 3
4 11
5 9
6 11
7 11
8 8
9 9
10 2
11 11
12 2
13 na
14 8
15 9
16 8
17 9
18 2
19 2
20 na
I just don't know how start for this. So I did not tried anything. I am just learning random data generation using pandas.
Upvotes: 1
Views: 108
Reputation: 75100
May be you can try a custom function like:
def myfunc(d):
s=d.sample(frac=.25)
d=d.assign(owner_id=s)
fill_na=pd.Series(np.random.choice(d['owner_id'].dropna(), size=len(df))) #thanks @jezrael
d['owner_id']=d['owner_id'].fillna(fill_na)
d.loc[d.sample(frac=.10).index,'owner_id']=np.nan
return d
myfunc(df)
ID owner_id
0 1 3.0
1 2 19.0
2 3 3.0
3 4 3.0
4 5 5.0
5 6 3.0
6 7 8.0
7 8 8.0
8 9 NaN
9 10 3.0
10 11 5.0
11 12 3.0
12 13 19.0
13 14 9.0
14 15 5.0
15 16 19.0
16 17 NaN
17 18 9.0
18 19 19.0
19 20 9.0
Upvotes: 1