Reputation: 159
I know how to randomly sample few rows from a pandas data frame.
using sample command
df_sample = df.sample(n=10)
However what I need is random column(i.e Village) from the below data frame.
For example : I want to randomly select 3 Villages entire data, i.e Village A, B & C. Village A,B & C will be randomly selected and give us output for entire data for this 3 villages.
likewise,
Here is my code
>>> import pandas as pd
>>> import numpy as np
>>> df=pd.read_excel("/home/Study.xlsx")
>>> df=df.sample(n=3)
>>> df
Sr.No ... Village
16 17 ... I
33 34 ... Q
36 37 ... S
So, I need that, if village I , Q and S are randomly selected, than i required entire data for this 3 villages.
Thanks.
Upvotes: 1
Views: 77
Reputation: 862661
Use numpy.random.choice
with unique values for random 3 villages and then filter by Series.isin
and boolean indexing
:
vil = np.random.choice(df['Village'].unique(), 3)
df = df[df['Village'].isin(vil)]
Pandas only solution with Series.drop_duplicates
and Series.sample
:
vil = df['Village'].drop_duplicates().sample(3)
df = df[df['Village'].isin(vil)]
For functions use:
def random_vil(x):
vil = np.random.choice(df['Village'].unique(), x)
return df[df['Village'].isin(vil)]
df = random_vil(3)
Upvotes: 1