Reputation: 609
I have a large dataframe with 2 columns, like this:
dtm SoC
0 2018-03-01 00:00:00 +0000 39
1 2018-03-01 00:00:01 +0000 39
2 2018-03-01 00:00:02 +0000 39
3 2018-03-01 00:00:03 +0000 39
... ... ...
2678393 2018-04-01 00:59:53 +0100 39
2678394 2018-04-01 00:59:54 +0100 39
2678395 2018-04-01 00:59:55 +0100 39
2678396 2018-04-01 00:59:56 +0100 39
2678397 2018-04-01 00:59:57 +0100 39
2678398 2018-04-01 00:59:58 +0100 39
2678399 2018-04-01 00:59:59 +0100 39
the column SoC is a random generated number between 0 and 40. I would like it to be a different random number repeated each 86400 rows (and not be the same for the entire dataframe).
To be more clear:
-rows 0-86399 1st random number
-rows 86400-172800 2nd random number
-etc
I was trying df['SoC']=np.repeat(random.randint(0,40),len(df)/86400)
but there is an error "Length of values does not match length of index"
ideas? thank you in advance
Upvotes: 2
Views: 224
Reputation: 862641
First create array with parameter size with floor division for integer and then repeat
:
#possible duplicated random values
df['SoC'] = np.repeat(np.random.randint(0,40, size=len(df) // 86400), 86400)
#unique random numbers
df['SoC'] = np.repeat(np.random.choice(np.range(0, 40),
size=len(df) // 86400, replace=False), 86400)
Upvotes: 2
Reputation: 1280
Another way:
arr = np.arange(40)
np.random.shuffle(arr)
arr
array([15, 30, 21, 3, 10, 19, 13, 31, 5, 32, 1, 39, 24, 6, 12, 7, 22,
38, 27, 20, 25, 35, 14, 28, 33, 18, 29, 17, 37, 36, 34, 8, 2, 0,
4, 11, 16, 23, 26, 9])
df['SoC'] = np.repeat(arr, 86400)
Upvotes: 0