Reputation: 1905
Suppose I have a DataFrame x
with this schema:
xSchema = StructType([ \
StructField("a", DoubleType(), True), \
StructField("b", DoubleType(), True), \
StructField("c", DoubleType(), True)])
I then have the DataFrame:
DataFrame[a :double, b:double, c:double]
I would like to have an integer derived column. I am able to create a boolean column:
x = x.withColumn('y', (x.a-x.b)/x.c > 1)
My new schema is:
DataFrame[a :double, b:double, c:double, y: boolean]
However, I would like column y
to contain 0 for False and 1 for True.
The cast
function can only operate on a column and not a DataFrame
and the withColumn
function can only operate on a DataFrame
. How to I add a new column and cast it to integer at the same time?
Upvotes: 23
Views: 40231
Reputation: 330273
Expression you use evaluates to column so you can cast directly like this:
x.withColumn('y', ((x.a-x.b) / x.c > 1).cast('integer')) # Or IntegerType()
Upvotes: 36