Pandas: Random integer between values in two columns

Question

How can I create a new column that calculates random integer between values of two columns in particular row.

Example df:

import pandas as pd
import numpy as np

data = pd.DataFrame({'start': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                     'end': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]})
data = data.iloc[:, [1, 0]]

Result:

Now I am trying something like this:

data['rand_between'] = data.apply(lambda x: np.random.randint(data.start, data.end))

or

data['rand_between'] = np.random.randint(data.start, data.end)

But it doesn't work of course because data.start is a Series not a number. how can I used numpy.random with data from columns as vectorized operation?

jezrael · Accepted Answer

You are close, need specify axis=1 for process data by rows and change data.start/end to x.start/end for working with scalars:

data['rand_between'] = data.apply(lambda x: np.random.randint(x.start, x.end), axis=1)

Another possible solution:

data['rand_between'] = [np.random.randint(s, e) for s,e in zip(data['start'], data['end'])]

print (data)
   start  end  rand_between
0      1   10             8
1      2   20             3
2      3   30            23
3      4   40            35
4      5   50            30
5      6   60            28
6      7   70            60
7      8   80            14
8      9   90            85
9     10  100            83

Pandas: Random integer between values in two columns

Answers (2)

Related Questions