Subtraction between two dfs yield to NAN values

Question

I have a numpy.ndarray with 17520 rows and 1000 columns. The np.ndarray has only two values [0,0.05]. I wanted to modify the cells that have a value of 0.05 for a random choice between 0 and 0.05. In order to do that I used the following functions as recommended by the post Random choice over specific values of a DF

import pandas as pd
df = pd.DataFrame(df)    
df.update(np.random.choice([0, 0.05], size=df.shape), filter_func=lambda x: x==0.05)

This solution worked, however, I have another pandas.DataFrame object, df1, and I need to create an additional dataframe object. The new dataframe, df_new, is the result of the difference of these two dataframes. I use the simple operation:

df_new = df1 - df

However, the results of df_new is a dataframe with different dimensions (17520 rows with 2000 columns) and NAN values.

Do you have any ideas why this is happening?

Thanks

Marco Lombardi · Accepted Answer

Not sure where is your problem, since you are not providing detailed information on how you build your DataFrames. In any case, you do not really have to use DataFrames for this: NumPy is certainly capable of doing what you need to do. Here is a sample code that you can use:

import numpy as np

# Randomly create the initial arrays, just to prove the code is OK
df1 = np.random.choice([0.0, 0.05], size=(17520,1000))
df2 = np.random.choice([0.0, 0.05], size=(17520,1000))

# Modify them
w1 = np.where(df1 == 0.05)
w2 = np.where(df2 == 0.05)
df1[w1] = np.random.choice([0.0, 0.05], size=len(w1[0]))
df2[w2] = np.random.choice([0.0, 0.05], size=len(w2[0]))

df_new = df1 - df2

Subtraction between two dfs yield to NAN values

Answers (2)

Related Questions