Alex
Alex

Reputation: 69

How to feed random numbers as indices to pandas data frame?

I'm trying to get a random sample from two pandas frames. If rows (random) 2,5,8 are selected in frame A, then the same 2,5,8 rows must be selected from frame B. I did it by first generating a random sample and now want to use this sample as indices for rows for frame. How can I do it? The code should look like

idx = list(random.sample(range(X_train.shape[0]),5000))

lgstc_reg[i].fit(X_train[idx,:], y_train[idx,:]) 

However, running the code gives an error.

Upvotes: 1

Views: 1582

Answers (2)

sam
sam

Reputation: 1896

Hope this helps!

>>> df1
   value  ID
0      3   2
1      4   2
2      7   8
3      8   8
4     11   8
>>> df2
   value  distance
0      3         0
1      4         0
2      7         1
3      8         0
4     11         0

I have two data frames. I want to select randoms of df1 along with corresponding rows of df2.

First I create a sample_index which a list of random rows of df using Pandas inbuilt function sample. Now use this index to location these rows in df1 and df2 with the help of another inbuilt funciton loc.

>>> selection_index = df1.sample(2).index
>>> selection_index
Int64Index([3, 1], dtype='int64')
>>> df1.loc[selection_index]
   value  ID
3      8   8
1      4   2
>>> df2.loc[selection_index]
   value  distance
3      8         0
1      4         0
>>>

In your case, this would become somewhat like

idx = X_train.sample(5000).index

lgstc_reg[i].fit(X_train.loc[idx], y_train.loc[idx]) 

Upvotes: 0

Ian
Ian

Reputation: 3898

Use iloc:

indexes = [2,5,8]  # in your case this is the randomly generated list
A.iloc[indexes]
B.iloc[indexes]

An alternative consistent sampling methodology would be to set a random seed, and then sample:

random_seed = 42
A.sample(3, random_state=random_seed)
B.sample(3, random_state=random_seed)

The sampled DataFrames will have the same index.

Upvotes: 1

Related Questions