Reputation: 206
I have a numpy.ndarray
with 17520 rows and 1000 columns. The np.ndarray
has only two values [0,0.05]
. I wanted to modify the cells that have a value of 0.05 for a random choice between 0 and 0.05. In order to do that I used the following functions as recommended by the post
Random choice over specific values of a DF
import pandas as pd
df = pd.DataFrame(df)
df.update(np.random.choice([0, 0.05], size=df.shape), filter_func=lambda x: x==0.05)
This solution worked, however, I have another pandas.DataFrame
object, df1
, and I need to create an additional dataframe object. The new dataframe, df_new
, is the result of the difference of these two dataframes. I use the simple operation:
df_new = df1 - df
However, the results of df_new
is a dataframe with different dimensions (17520 rows with 2000 columns) and NAN
values.
Do you have any ideas why this is happening?
Thanks
Upvotes: 1
Views: 882
Reputation: 557
Not sure where is your problem, since you are not providing detailed information on how you build your DataFrames. In any case, you do not really have to use DataFrames for this: NumPy is certainly capable of doing what you need to do. Here is a sample code that you can use:
import numpy as np
# Randomly create the initial arrays, just to prove the code is OK
df1 = np.random.choice([0.0, 0.05], size=(17520,1000))
df2 = np.random.choice([0.0, 0.05], size=(17520,1000))
# Modify them
w1 = np.where(df1 == 0.05)
w2 = np.where(df2 == 0.05)
df1[w1] = np.random.choice([0.0, 0.05], size=len(w1[0]))
df2[w2] = np.random.choice([0.0, 0.05], size=len(w2[0]))
df_new = df1 - df2
Upvotes: 1
Reputation: 19885
The columns of df
and df1
are not the same.
Incidentally, as opposed to update
, the following works too:
df[df == 0.05] = np.random.choice([0., 0.05], size=df.shape)
Upvotes: 1