afriedman111
afriedman111

Reputation: 2331

Virtual column with calculation in Vaex

I want to set a virtual column to a calculation using another column in Vaex. I need to use an if statement inside this calculation. In general I want to call

df['calculation_col'] = log(df['original_col']) if df['original_col'] == 0 else -4

I then try to run the count function in Vaex:

hist = df.count(
        binby='calculation_col',
        limits=limits,
        shape=binnum,
        delay=True
    )

When I try to execute this code I get the error ValueError: zero-size array to reduction operation minimum which has no identity.

How can I use a conditional for a virtual column in Vaex?

Upvotes: 3

Views: 514

Answers (2)

Joco
Joco

Reputation: 813

Probably the most "vaex" way to do this would be to use where:

import vaex
df = vaex.example()
# The syntax is where(condition, if satisfied, else)
df['calculated_col'] = df.func.where(df['x'] < 10, 0, -4)

Upvotes: 3

SultanOrazbayev
SultanOrazbayev

Reputation: 16561

It might be useful to use a mask for subsetting the relevant rows:

import vaex

df = vaex.example()

mask = df["id"] < 10

df["new_col"] = mask * df["x"] + ~mask * (-4)

print(df[['id', 'x', 'new_col']].head(4))
# #    id          x    new_col
# 0     0   1.23187     1.23187
# 1    23  -0.163701   -4
# 2    32  -2.12026    -4
# 3     8   4.71559     4.71559

Kindly note that in the original script, there would be an error triggered by numpy due to taking np.log of zero, so using np.log1p might more be appropriate in that case.

Upvotes: 1

Related Questions