Reputation: 759
I am trying to use DataFrames.combine
to chain multiple transformations. The desired final DataFrame is the one below.
using DataFrames, Statistics
df = DataFrame(x = repeat([1], 4))
df_2 = combine(df,
:x => sum => :sum_x)
df_2.sqrt_sum_x .= sqrt.(df_2.sum_x)
println(df_2)
#1×2 DataFrame
# Row │ sum_x sqrt_sum_x
# │ Int64 Float64
#─────┼───────────────────
# 1 │ 4 2.0
I was wondering if there is any way of achieving the previous result with a single call to combine
. E.g. by using the new target_cols
:sum_x
as a column in the argument (see code below). However, this seems to throw an ArgumentError
as it can not find the newly computed :sum_x
column.
combine(df,
:x => sum => :sum_x,
:sum_x => sqrt => :sqrt_sum_x)
# ERROR: ArgumentError: column name :sum_x not found in the data frame
Upvotes: 3
Views: 347
Reputation: 69819
Currently this is not allowed. The reason is that the order of execution of transformations in combine
is undefined. In particular, in some situations these operations are executed in parallel using multi-threading (to improve performance).
Additionally such operation could potentially be problematic in interpretation for example if you would have written:
combine(df,
:x => sum => :sum_x,
[:x, :sum_x] => (+) => :x_plus_sum_x)
then in transformation:
[:x, :sum_x] => + => :x_plus_sum_x
:x
would come from the source data frame df
(and have 4 elements), while :sum_x
would come from "yet not existent" target data frame (and have 1 element). Technically it would be possible to make it work, but we considered that this could be confusing.
Upvotes: 2