Mateusz Konopelski
Mateusz Konopelski

Reputation: 1042

Pandas: Random integer between values in two columns

How can I create a new column that calculates random integer between values of two columns in particular row.

Example df:

import pandas as pd
import numpy as np

data = pd.DataFrame({'start': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                     'end': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]})
data = data.iloc[:, [1, 0]]

Result:

enter image description here

Now I am trying something like this:

data['rand_between'] = data.apply(lambda x: np.random.randint(data.start, data.end))

or

data['rand_between'] = np.random.randint(data.start, data.end)

But it doesn't work of course because data.start is a Series not a number. how can I used numpy.random with data from columns as vectorized operation?

Upvotes: 1

Views: 4382

Answers (2)

user2285236
user2285236

Reputation:

If you want to truly vectorize this, you can generate a random number between 0 and 1 and normalize it with your min/max numbers:

(
    data['start'] + np.random.rand(len(data)) * (data['end'] - data['start'] + 1)
).astype('int')

Out: 
0     1
1    18
2    18
3    35
4    22
5    27
6    35
7    23
8    33
9    81
dtype: int64

Upvotes: 2

jezrael
jezrael

Reputation: 863236

You are close, need specify axis=1 for process data by rows and change data.start/end to x.start/end for working with scalars:

data['rand_between'] = data.apply(lambda x: np.random.randint(x.start, x.end), axis=1)

Another possible solution:

data['rand_between'] = [np.random.randint(s, e) for s,e in zip(data['start'], data['end'])]

print (data)
   start  end  rand_between
0      1   10             8
1      2   20             3
2      3   30            23
3      4   40            35
4      5   50            30
5      6   60            28
6      7   70            60
7      8   80            14
8      9   90            85
9     10  100            83

Upvotes: 3

Related Questions