Reputation: 814
Suppose I have a Pandas Dataframe named df
, which has the following structure:-
Column 1 Column 2 ......... Column 104
Row 1 0.01 0.55 3
Row 2 0.03 0.14 1
...
Row 100 0.75 0.56 0
What I am trying to accomplish is that for all rows which match the condition given below, I need to generate 100
more rows with a random value between 0
and 0.05
added to each row:-
is_less = df.iloc[:,-1] > 1
df_try = df[is_less]
df = df.append([df_try]*100,ignore_index=True)
The problem is that I can simply duplicate the rows in df_try
to generate 100
more rows for each case, but I want to add a random value to each row as well, such that each row is different from the others but very similar.
import random
df = df.append([df_try + random.uniform(0,0.05)]*100, ignore_index=True)
What this does is to simply add the fixed random value to df_try
's 100
new rows, but not a unique random value to each row. I know that this is because the above syntax does not iterate over df_try, resulting in the fixed random value being added, but is there a suitable way to add the random values iteratively over the data frame in this case?
Upvotes: 1
Views: 1342
Reputation: 863421
One idea is create 2d array with same size like new appended DataFrame
and add to joined lists with concat
:
N = 10
arr = np.random.uniform(0,0.05, size=(N, len(df.columns)))
is_less = df.iloc[:,-1] > 1
df_try = df[is_less]
df = df.append(pd.concat([df_try]*N) + arr,ignore_index=True)
print (df)
Column 1 Column 2 Column 104
0 0.010000 0.550000 3.000000
1 0.030000 0.140000 1.000000
2 0.750000 0.560000 0.000000
3 0.024738 0.561647 3.045146
4 0.035315 0.584161 3.008656
5 0.022386 0.563025 3.033091
6 0.039175 0.588785 3.004649
7 0.049465 0.594903 3.003303
8 0.027366 0.580478 3.041745
9 0.044721 0.599853 3.001736
10 0.052849 0.589775 3.042434
11 0.033957 0.582610 3.045215
12 0.044349 0.582218 3.027665
Your solution should be changed by list comprehension if need add scalar to each df_try
:
N = 10
is_less = df.iloc[:,-1] > 1
df_try = df[is_less]
df = df.append( [df_try + random.uniform(0, 0.05) for _ in range(N)], ignore_index=True)
print (df)
Column 1 Column 2 Column 104
0 0.010000 0.550000 3.000000
1 0.030000 0.140000 1.000000
2 0.750000 0.560000 0.000000
3 0.036756 0.576756 3.026756
4 0.039357 0.579357 3.029357
5 0.048746 0.588746 3.038746
6 0.040197 0.580197 3.030197
7 0.011045 0.551045 3.001045
8 0.013942 0.553942 3.003942
9 0.054658 0.594658 3.044658
10 0.025909 0.565909 3.015909
11 0.012093 0.552093 3.002093
12 0.058463 0.598463 3.048463
Upvotes: 1
Reputation: 19885
You can combine the copies first and create a single array containing all the random values, add them together, and then append the result to the original:
import numpy as np
n_copies = 2
df = pd.DataFrame(np.c_[np.arange(6), np.random.randint(1, 3, size=6)])
subset = df[df.iloc[:, -1] > 1]
extra = pd.concat([subset] * n_copies).add(np.random.uniform(0, 0.05, len(subset) * n_copies), axis='rows')
result = df.append(extra, ignore_index=True)
print(result)
Output:
0 1
0 0.000000 2.000000
1 1.000000 2.000000
2 2.000000 1.000000
3 3.000000 2.000000
4 4.000000 1.000000
5 5.000000 2.000000
6 0.007723 2.007723
7 1.005718 2.005718
8 3.003063 2.003063
9 5.005238 2.005238
10 0.006509 2.006509
11 1.034742 2.034742
12 3.022345 2.022345
13 5.040911 2.040911
Upvotes: 0