Reputation: 23
I am new to Pandas and Python. I will write my question over an example. I have a data such as
df = pd.DataFrame([[1, 2], [1, 3], [4, 6], [5,6], [7,8], [9,10], [11,12], [13,14]], columns=['A', 'B'])
df
A B
0 1 2
1 1 3
2 4 6
3 5 6
4 7 8
5 9 10
6 11 12
7 13 14
I am taking 3 samples from both column.
x = df['A'].sample(n=3)
x = x.reset_index(drop=True)
x
0 7
1 9
2 11
y = df['B'].sample(n=3)
y = y.reset_index(drop=True)
y
0 6
1 12
2 2
I would like to do this taking sample(n=3) 10 times.
I tried [y] * 10
, it produces columns 10 times out of 6,12,2. I want to do this 10 times from main data.Then I would like to make a new data out of this new columns generated from A and B.
I thought maybe I should write for loop but I am not so familiar with them.
Thanks for the helps.
Upvotes: 1
Views: 2503
Reputation: 2996
As WeNYoBen showed, it is good practice to split the task into
My suggestion: Write a generator function that is used to create a generator (instead of a list) of your sample replicates. Then you can concatenate the items (in this case, data frames) that the generator yields.
# a generator function
def sample_rep(dframe, n=None, replicates=None):
for i in range(replicates):
yield dframe.sample(n)
d = pd.concat(sample_rep(df, n=3, replicates=10),
keys=range(1, 11), names=["replicate"])
The generator uses up less memory because it produces everything on the fly. The pd.concat()
function triggers sample_rep()
on your dataframe which generates the list of data frames to concatenate.
Upvotes: 1
Reputation: 323226
Seems like you need
df.apply(lambda x : x.sample(3)).apply(lambda x : sorted(x,key=pd.isnull)).dropna().reset_index(drop=True)
Out[353]:
A B
0 7.0 2.0
1 11.0 6.0
2 13.0 12.0
Sorry for the misleading , I overlook the 10 times
l=[]
count = 1
while (count < 11):
l.append(df.apply(lambda x : x.sample(3)).apply(lambda x : sorted(x,key=pd.isnull)).dropna().reset_index(drop=True))
count = count + 1
pd.concat(l)
Upvotes: 0