makpalan
makpalan

Reputation: 145

Pandas (with def and np.where): error with values in a dataframe row conditioned on another dataframe row

I have dataframes A of shape XxY with values and dataframes B of shape ZxY to be filled with statistics calculated from A.

As an example:

A = pd.DataFrame(np.array(range(9)).reshape((3,3)))
B = pd.DataFrame(np.array(range(6)).reshape((2,3)))

Now I need to fill row 1 of B with quantile(0.5) of A columns where row 0 of B > 1 (else: np.nan). I need to use a function of the kind:

def mydef(df0, df1):
    df1.loc[1] = np.where(df1.loc[0]>1,
                          df0.quantile(0.5),
                          np.nan)
    pass

mydef(A,B)

Now B is:

    0   1   2
0   0.0 1.0 2.0
1   NaN NaN 3.5

It works perfectly for these mock dataframes and all my real ones apart from one. For that one this error is raised:

ValueError: cannot set using a list-like indexer with a different length than the value

When I run the same code without calling a function, it doesn't raise any error. Since I need to use a function, any suggestion?

Upvotes: 1

Views: 52

Answers (1)

makpalan
makpalan

Reputation: 145

I found the error. I erroneously had the same label twice in the index. Essentially my dataframe B was something like:

B = pd.DataFrame(np.array(range(9)).reshape((3,3)), index=[0,0,1])

so that calling the def:

def mydef(df0, df1):
df1.loc[1] = np.where(df1.loc[0]>1,
                      df0.quantile(0.5),
                      np.nan)
pass

would cause the condition and the if-false lines of np.where to not match their shapes, I guess.

Still not sure why working outside the def worked.

Upvotes: 1

Related Questions