Reputation: 2331
I want to set a virtual column to a calculation using another column in Vaex. I need to use an if statement inside this calculation. In general I want to call
df['calculation_col'] = log(df['original_col']) if df['original_col'] == 0 else -4
I then try to run the count function in Vaex:
hist = df.count(
binby='calculation_col',
limits=limits,
shape=binnum,
delay=True
)
When I try to execute this code I get the error ValueError: zero-size array to reduction operation minimum which has no identity
.
How can I use a conditional for a virtual column in Vaex?
Upvotes: 3
Views: 514
Reputation: 813
Probably the most "vaex" way to do this would be to use where
:
import vaex
df = vaex.example()
# The syntax is where(condition, if satisfied, else)
df['calculated_col'] = df.func.where(df['x'] < 10, 0, -4)
Upvotes: 3
Reputation: 16561
It might be useful to use a mask for subsetting the relevant rows:
import vaex
df = vaex.example()
mask = df["id"] < 10
df["new_col"] = mask * df["x"] + ~mask * (-4)
print(df[['id', 'x', 'new_col']].head(4))
# # id x new_col
# 0 0 1.23187 1.23187
# 1 23 -0.163701 -4
# 2 32 -2.12026 -4
# 3 8 4.71559 4.71559
Kindly note that in the original script, there would be an error triggered by numpy
due to taking np.log
of zero, so using np.log1p
might more be appropriate in that case.
Upvotes: 1