PyuPyuPyu
PyuPyuPyu

Reputation: 1

Pandas - Compare two dataframes and replace values matching condition

I have two pandas dataframes(df1 and df2) with the exact same number of columns and rows. (colum and index names are the same as well) The values in these two dataframes may or may not differ.

I want to compare every value in df1 with the value in the corresponding position in df2 and if the value in df2 is equal or bigger then the value in df1 i want to replace the value in df1 with a random integer.

So i thought I would want something like this (but preferably there wouldn't be any loops at all)

for every value in df1
    df1.value - df2.value
    if df1.value < 1
        df1.value = np.random()

I tried looking at pandas df.replace function in combination with the df.where function but I just can't seem to get it work it.

Edit: I want to add something i forgot previously. When assigning my random int I want it to be within a a range based on my corresponding value. So it will be:

for every value in df1
    df1.value - df2.value
    if df1.value < 1
        df1.value = np.random( in range (df1.value -10, df.value +10)

I believe this not possible with Pietro Tortella answer as I'm processing the dataframe as whole.

Does anyone know how to solve this?

Upvotes: 0

Views: 1958

Answers (1)

Pietro Tortella
Pietro Tortella

Reputation: 1114

If memory is not a concern, I would create a third DataFrame of random numbers, and make a substitution using the difference as a mask.

For instance, something like

randoms = pd.DataFrame(
    np.random.randn(*df1.values.shape), 
    index=df1.index,
    columns=df1.columns
)

df1[df2 >= df1] = randoms[df2 >= df1]

Upvotes: 2

Related Questions