Reputation: 23

Random Sampling from a column several times in Python/Pandas

I am new to Pandas and Python. I will write my question over an example. I have a data such as

df = pd.DataFrame([[1, 2], [1, 3], [4, 6], [5,6], [7,8], [9,10], [11,12], [13,14]], columns=['A', 'B'])
df 
    A   B

0   1   2

1   1   3

2   4   6

3   5   6

4   7   8

5   9   10

6   11  12

7   13  14

I am taking 3 samples from both column.

x = df['A'].sample(n=3)
x = x.reset_index(drop=True)
x

0     7
1     9
2    11

y = df['B'].sample(n=3)
y = y.reset_index(drop=True)
y

0     6
1    12
2     2

I would like to do this taking sample(n=3) 10 times. I tried [y] * 10, it produces columns 10 times out of 6,12,2. I want to do this 10 times from main data.Then I would like to make a new data out of this new columns generated from A and B. I thought maybe I should write for loop but I am not so familiar with them.

Thanks for the helps.

Upvotes: 1

Answers (2)

okartal

Reputation: 2996

As WeNYoBen showed, it is good practice to split the task into

generating the sample replicates,
concatinating the data frames.

My suggestion: Write a generator function that is used to create a generator (instead of a list) of your sample replicates. Then you can concatenate the items (in this case, data frames) that the generator yields.

# a generator function
def sample_rep(dframe, n=None, replicates=None):
    for i in range(replicates):
        yield dframe.sample(n)

d = pd.concat(sample_rep(df, n=3, replicates=10),
              keys=range(1, 11), names=["replicate"])

The generator uses up less memory because it produces everything on the fly. The pd.concat() function triggers sample_rep() on your dataframe which generates the list of data frames to concatenate.

Upvotes: 1

BENY

Reputation: 323226

Seems like you need

df.apply(lambda x : x.sample(3)).apply(lambda x : sorted(x,key=pd.isnull)).dropna().reset_index(drop=True)
Out[353]: 
      A     B
0   7.0   2.0
1  11.0   6.0
2  13.0  12.0

Sorry for the misleading , I overlook the 10 times

l=[]
count = 1
while (count < 11):
   l.append(df.apply(lambda x : x.sample(3)).apply(lambda x : sorted(x,key=pd.isnull)).dropna().reset_index(drop=True))
   count = count + 1

pd.concat(l)

Upvotes: 0

Random Sampling from a column several times in Python/Pandas

Answers (2)

Related Questions