RnK
RnK

Reputation: 117

randomly subsample once every month pandas

I have the following dataframe.

data = {'bid':['23243', '23243', '23243', '12145', '12145', '12145', '12145'],
        'lid':['54346786', '23435687', '23218987', '43454432', '21113567', '54789876', '22898721'],
        'date':['2021-08-11','2021-08-12','2021-09-17','2021-05-02','2021-05-11','2021-05-20','2021-08-13'],
        'val1':[44,34,54,76,89,33,27],
        'val2':[11,55,43,89,76,44,88]}
 
df = pd.DataFrame(data)

enter image description here

What I am looking for is to randomly pick a lid per month for the bid column, and maintain a count of past instances until the point of the random sample, something similar to this:

enter image description here

I can think of separating the year and months into different columns and then apply pd.groupby on the bid, year and month with the pd.Series.sample function, but there must be a better way of doing it.

Upvotes: 1

Views: 499

Answers (2)

jezrael
jezrael

Reputation: 862671

Use GroupBy.cumcount per bid and then per months and bid use DataFrameGroupBy.sample:

df['date'] = pd.to_datetime(df['date'])

#if necessary sorting
#df = df.sort_values(['bid','date'])

df['prev'] = df.groupby('bid').cumcount()
df1 = df.groupby(['bid', pd.Grouper(freq='M', key='date')], sort=False).sample(n=1)

print (df1)
     bid       lid       date  val1  val2  prev
1  23243  23435687 2021-08-12    34    55     1
2  23243  23218987 2021-09-17    54    43     2
5  12145  54789876 2021-05-20    33    44     2
6  12145  22898721 2021-08-13    27    88     3

Upvotes: 1

Corralien
Corralien

Reputation: 120409

IIUC, use groupby.sample, assume date column have datetime64 dtype:

out = df.groupby([df['date'].dt.month, 'bid']).sample(n=1).reset_index(drop=True)
print(out)

# Output
     bid       lid       date  val1  val2
0  12145  21113567 2021-05-11    89    76
1  12145  22898721 2021-08-13    27    88
2  23243  23435687 2021-08-12    34    55
3  23243  23218987 2021-09-17    54    43

Upvotes: 0

Related Questions