r-learning-machine
r-learning-machine

Reputation: 115

How to make a random sample of panel data (keeping all years for each randomly selected id)

I am using an unbalanced panel dataset, with multiple ids, each with 1 or more years of data. I would like to work with a smaller dataset as I build my code. I would therefore like to randomnly choose IDs, but for each ID that is picked at random it should keep all year-data from that ID.

import pandas as pd
df = pd.DataFrame({'id': ['1', '1', '2', '2', '3', '4', '4', '5', '6', '7', '7'], 
                   'value': [40000, 50000, 42000, 20000, 20000, 25000, 27000, 20000, 23000, 50000, 22000]})

print(df)

   id  value
0   1  40000
1   1  50000
2   2  42000
3   2  20000
4   3  20000
5   4  25000
6   4  27000
7   5  20000
8   6  23000
9   7  50000
10  7  22000

Suppose I wanted to sample, two ids to create a new panel. Suppose I randomly select id = 7 and id = 1. I would need index values 0,1,9,10

Upvotes: 0

Views: 404

Answers (1)

inquirer
inquirer

Reputation: 4823

Randomly select two 'id'. Get their indexes and, if necessary, values.

import pandas as pd
import random

df = pd.DataFrame({'id': ['1', '1', '2', '2', '3', '4', '4', '5', '6', '7', '7'],
                   'value': [40000, 50000, 42000, 20000, 20000, 25000, 27000, 20000, 23000, 50000, 22000]})

rrr = random.choices(df['id'], k=2)#['1', '7'] randomly selected two values
index = df.loc[df['id'].isin(rrr)].index#[0, 1, 9, 10]indexes by two values
value = df.iloc[index]#Obtained values by indexes

Output index

[0, 1, 9, 10]

Output values

   id  value
0   1  40000
1   1  50000
9   7  50000
10  7  22000

Upvotes: 1

Related Questions