NodeSlayer
NodeSlayer

Reputation: 73

Generate random timeseries data with dates

I am trying to generate random data(integers) with dates so that I can practice pandas data analytics commands on it and plot time series graphs.

             temp     depth   acceleration
2019-01-1 -0.218062 -1.215978 -1.674843
2019-02-1 -0.465085 -0.188715  0.241956
2019-03-1 -1.464794 -1.354594  0.635196
2019-04-1  0.103813  0.194349 -0.450041
2019-05-1  0.437921  0.073829  1.346550

Is there any random dataframe generator that can generate something like this with each date having a gap of one month?

Upvotes: 7

Views: 11387

Answers (2)

DataCruncher
DataCruncher

Reputation: 880

You can either use pandas.util.testing

import pandas.util.testing as testing
import numpy as np
np.random.seed(1)

testing.N, testing.K = 5, 3  # Setting the rows and columns of the desired data

print testing.makeTimeDataFrame(freq='MS')
>>>
                   A         B         C
2000-01-01 -0.488392  0.429949 -0.723245
2000-02-01  1.247192 -0.513568 -0.512677
2000-03-01  0.293828  0.284909  1.190453
2000-04-01 -0.326079 -1.274735 -0.008266
2000-05-01 -0.001980  0.745803  1.519243

Or, if you need more control over the random values being generated, you can use something like

import numpy as np
import pandas as pd
np.random.seed(1)

rows,cols = 5,3
data = np.random.rand(rows,cols) # You can use other random functions to generate values with constraints
tidx = pd.date_range('2019-01-01', periods=rows, freq='MS') # freq='MS'set the frequency of date in months and start from day 1. You can use 'T' for minutes and so on
data_frame = pd.DataFrame(data, columns=['a','b','c'], index=tidx)
print data_frame
>>>
                   a         b         c
2019-01-01  0.992856  0.217750  0.538663
2019-02-01  0.189226  0.847022  0.156730
2019-03-01  0.572417  0.722094  0.868219
2019-04-01  0.023791  0.653147  0.857148
2019-05-01  0.729236  0.076817  0.743955

Upvotes: 12

jezrael
jezrael

Reputation: 862511

Use numpy.random.rand or numpy.random.randint functions with DataFrame constructor:

np.random.seed(2019)
N = 10
rng = pd.date_range('2019-01-01', freq='MS', periods=N)
df = pd.DataFrame(np.random.rand(N, 3), columns=['temp','depth','acceleration'], index=rng)

print (df)
                temp     depth  acceleration
2019-01-01  0.903482  0.393081      0.623970
2019-02-01  0.637877  0.880499      0.299172
2019-03-01  0.702198  0.903206      0.881382
2019-04-01  0.405750  0.452447      0.267070
2019-05-01  0.162865  0.889215      0.148476
2019-06-01  0.984723  0.032361      0.515351
2019-07-01  0.201129  0.886011      0.513620
2019-08-01  0.578302  0.299283      0.837197
2019-09-01  0.526650  0.104844      0.278129
2019-10-01  0.046595  0.509076      0.472426

If need integers:

np.random.seed(2019)
N = 10
rng = pd.date_range('2019-01-01', freq='MS', periods=N)
df = pd.DataFrame(np.random.randint(20, size=(10, 3)), 
                  columns=['temp','depth','acceleration'], 
                  index=rng)

print (df)
            temp  depth  acceleration
2019-01-01     8     18             5
2019-02-01    15     12            10
2019-03-01    16     16             7
2019-04-01     5     19            12
2019-05-01    16     18             5
2019-06-01    16     15             1
2019-07-01    14     12            10
2019-08-01     0     11            18
2019-09-01    15     19             1
2019-10-01     3     16            18

Upvotes: 10

Related Questions