Reputation: 115
I am using an unbalanced panel dataset, with multiple ids, each with 1 or more years of data. I would like to work with a smaller dataset as I build my code. I would therefore like to randomnly choose IDs, but for each ID that is picked at random it should keep all year-data from that ID.
import pandas as pd
df = pd.DataFrame({'id': ['1', '1', '2', '2', '3', '4', '4', '5', '6', '7', '7'],
'value': [40000, 50000, 42000, 20000, 20000, 25000, 27000, 20000, 23000, 50000, 22000]})
print(df)
id value
0 1 40000
1 1 50000
2 2 42000
3 2 20000
4 3 20000
5 4 25000
6 4 27000
7 5 20000
8 6 23000
9 7 50000
10 7 22000
Suppose I wanted to sample, two ids to create a new panel. Suppose I randomly select id = 7 and id = 1. I would need index values 0,1,9,10
Upvotes: 0
Views: 404
Reputation: 4823
Randomly select two 'id'. Get their indexes and, if necessary, values.
import pandas as pd
import random
df = pd.DataFrame({'id': ['1', '1', '2', '2', '3', '4', '4', '5', '6', '7', '7'],
'value': [40000, 50000, 42000, 20000, 20000, 25000, 27000, 20000, 23000, 50000, 22000]})
rrr = random.choices(df['id'], k=2)#['1', '7'] randomly selected two values
index = df.loc[df['id'].isin(rrr)].index#[0, 1, 9, 10]indexes by two values
value = df.iloc[index]#Obtained values by indexes
Output index
[0, 1, 9, 10]
Output values
id value
0 1 40000
1 1 50000
9 7 50000
10 7 22000
Upvotes: 1