Reputation: 23
My initial dataframe looks as follows:
import pandas as pd
df = pd.DataFrame({
"id":[1,1,1,1,2,2],
"time": [1,2,3,4,5,6],
"x": [1,2,3,4,9,11 ],
"y": [5,6,7,8,3,2],
})
So I have two IDs (1 and 2) or two different time series. Now I want to add some random noise to x- and y-value for each ID and save it as new IDs (with same length) in the initial df:
# Noise
import numpy as np
noise = np.random.normal(0,1,#number of elements you get in array noise)
new_signal = original + noise
# https://stackoverflow.com/questions/14058340/adding-noise-to-a-signal-in-python
So the resulting df would look something like the following (the values are just an example what the resulting output could be):
df = pd.DataFrame({
"id":[1,1,1,1,2,2 ,3,3,3,3, 4,4],
"time": [1,2,3,4,5,6 ,7,8,9,10, 11,12 ],
"x": [1,2,3,4,9,11, 1.0005,2.3256,3.1256,4.5647, 9.6514,11.4567 ],
"y": [5,6,7,8,3,2, 5.0505,6.0276,7.1056,8.5607, 3.6014,2.4567],
})
As you can see: 2 new IDs (3 and 4) have been added and also the values with noise.
Currently I am trying it with different loops but it seems quite complicated. Any suggestions?
Bonus question: How not just duplicate, but adding it by 3 times.
Upvotes: 1
Views: 230
Reputation: 260790
You can reindex
and add values to increment the id, time and add noise on the data.
This works for an arbitrary number of repeats:
import numpy as np
N = 3
(df.reindex(np.tile(df.index, N)) # replicate N times the dataframe
.add(np.c_[np.repeat(np.arange(N), len(df)), # increment id
np.repeat(np.arange(N), len(df))*len(df), # increment time
np.r_[np.zeros((len(df), 2)), # no noise for first
np.random.normal(size=(len(df)*(N-1), 2))] # extra noise
])
)
Example with N=3
:
id time x y
0 1.0 1.0 1.000000 5.000000
1 1.0 2.0 2.000000 6.000000
2 1.0 3.0 3.000000 7.000000
3 1.0 4.0 4.000000 8.000000
4 2.0 5.0 9.000000 3.000000
5 2.0 6.0 11.000000 2.000000
0 2.0 7.0 0.651240 4.713942
1 2.0 8.0 1.426533 5.446687
2 2.0 9.0 3.187928 7.430646
3 2.0 10.0 2.998382 9.421992
4 3.0 11.0 10.282871 2.108504
5 3.0 12.0 10.531258 2.439636
0 3.0 13.0 -0.200542 5.286711
1 3.0 14.0 0.350241 8.114173
2 3.0 15.0 1.843902 6.725896
3 3.0 16.0 3.831534 7.964400
4 4.0 17.0 7.612370 2.737872
5 4.0 18.0 12.129517 2.809689
Upvotes: 0
Reputation: 120429
You can build a new dataframe and concat them:
df1 = pd.concat([df['id'] + df['id'].max(),
df['time'] + df['time'].max(),
df['x'] + np.random.normal(0, 1, len(df)),
df['y'] + np.random.normal(0, 1, len(df))], axis=1) \
.set_index(df.index + len(x))
out = pd.concat([df, df1])
Output:
>>> out
id time x y
0 1 1 1.000000 5.000000
1 1 2 2.000000 6.000000
2 1 3 3.000000 7.000000
3 1 4 4.000000 8.000000
4 2 5 9.000000 3.000000
5 2 6 11.000000 2.000000
10 3 7 1.479734 5.720535
11 3 8 0.076273 6.256060
12 3 9 2.856642 6.845974
13 3 10 4.119396 7.738969
14 4 11 9.220569 2.710783
15 4 12 10.451495 1.245976
Upvotes: 1