Jonathan Budez
Jonathan Budez

Reputation: 206

Subtraction between two dfs yield to NAN values

I have a numpy.ndarray with 17520 rows and 1000 columns. The np.ndarray has only two values [0,0.05]. I wanted to modify the cells that have a value of 0.05 for a random choice between 0 and 0.05. In order to do that I used the following functions as recommended by the post Random choice over specific values of a DF

import pandas as pd
df = pd.DataFrame(df)    
df.update(np.random.choice([0, 0.05], size=df.shape), filter_func=lambda x: x==0.05)

This solution worked, however, I have another pandas.DataFrame object, df1, and I need to create an additional dataframe object. The new dataframe, df_new, is the result of the difference of these two dataframes. I use the simple operation:

df_new = df1 - df

However, the results of df_new is a dataframe with different dimensions (17520 rows with 2000 columns) and NAN values.

Do you have any ideas why this is happening?

Thanks

Upvotes: 1

Views: 882

Answers (2)

Marco Lombardi
Marco Lombardi

Reputation: 557

Not sure where is your problem, since you are not providing detailed information on how you build your DataFrames. In any case, you do not really have to use DataFrames for this: NumPy is certainly capable of doing what you need to do. Here is a sample code that you can use:

import numpy as np

# Randomly create the initial arrays, just to prove the code is OK
df1 = np.random.choice([0.0, 0.05], size=(17520,1000))
df2 = np.random.choice([0.0, 0.05], size=(17520,1000))

# Modify them
w1 = np.where(df1 == 0.05)
w2 = np.where(df2 == 0.05)
df1[w1] = np.random.choice([0.0, 0.05], size=len(w1[0]))
df2[w2] = np.random.choice([0.0, 0.05], size=len(w2[0]))

df_new = df1 - df2

Upvotes: 1

gmds
gmds

Reputation: 19885

The columns of df and df1 are not the same.

Incidentally, as opposed to update, the following works too:

df[df == 0.05] = np.random.choice([0., 0.05], size=df.shape)

Upvotes: 1

Related Questions