Kiran
Kiran

Reputation: 159

Using pandas how to use column data field for random sample

I know how to randomly sample few rows from a pandas data frame.

using sample command

df_sample = df.sample(n=10)

However what I need is random column(i.e Village) from the below data frame.

Dummy Data:

For example : I want to randomly select 3 Villages entire data, i.e Village A, B & C. Village A,B & C will be randomly selected and give us output for entire data for this 3 villages.

likewise,

enter image description here

Here is my code

>>> import pandas as pd
>>> import numpy as np
>>> df=pd.read_excel("/home/Study.xlsx")
>>> df=df.sample(n=3)
>>> df
    Sr.No  ...  Village
16     17  ...        I
33     34  ...        Q
36     37  ...        S

So, I need that, if village I , Q and S are randomly selected, than i required entire data for this 3 villages.

Thanks.

Upvotes: 1

Views: 77

Answers (1)

jezrael
jezrael

Reputation: 862661

Use numpy.random.choice with unique values for random 3 villages and then filter by Series.isin and boolean indexing:

vil = np.random.choice(df['Village'].unique(), 3)
df = df[df['Village'].isin(vil)]

Pandas only solution with Series.drop_duplicates and Series.sample:

vil = df['Village'].drop_duplicates().sample(3)
df = df[df['Village'].isin(vil)]

For functions use:

def random_vil(x):
    vil = np.random.choice(df['Village'].unique(), x)
    return df[df['Village'].isin(vil)]

 df = random_vil(3)

Upvotes: 1

Related Questions