Josep Espasa
Josep Espasa

Reputation: 759

DataFrames.transform specifying target variable with anonymous function in Julia

I am trying to use transform with an anonymous function (x -> uppercase.(x)) and store the new column as "A" by specifying a target column name (:A).

If I don't specify a target column variable (first transformation below), the new variable is produced fine (i.e. a Vector with 5 elements). However, once I specify the target column (second transformation below), the function returns a Vector of Pairs under the "a_function" name.

How can I produce the desired DataFrame with a new column "A" containing a Vector with 5 elements ("A" to "E")? Why does the second transformation below return a Vector of Pairs with a name different from that specifyed?

using DataFrames

df_1 = DataFrame(a = ["a", "b", "c", "d", "e"])

df_2 = transform(df_1, :a => x -> uppercase.(x))  # first transformation

df_2
 Row │ a       a_function 
     │ String  String
─────┼────────────────────
   1 │ a       A
   2 │ b       B
   3 │ c       C
   4 │ d       D
   5 │ e       E


df_3 = transform(df_1, :a => x -> uppercase.(x) => :A) # second transformation

df_3
5×2 DataFrame
 Row │ a       a_function
     │ String  Pair…
─────┼───────────────────────────────────────
   1 │ a       ["A", "B", "C", "D", "E"]=>:A
   2 │ b       ["A", "B", "C", "D", "E"]=>:A
   3 │ c       ["A", "B", "C", "D", "E"]=>:A
   4 │ d       ["A", "B", "C", "D", "E"]=>:A
   5 │ e       ["A", "B", "C", "D", "E"]=>:A

Desired outcome DataFrame:

 DataFrame(a = ["a", "b", "c", "d", "e"],
    A = ["A", "B", "C", "D", "E"])

Upvotes: 3

Views: 489

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

The reason is operator precedence, if you write:

julia> :a => x -> uppercase.(x) => :A
:a => var"#7#8"()

you see that you have defined only one pair. The whole part uppercase.(x) => :A became the body of your anonymous function.

Instead write (note I added ( and ) around the anonymous function):

julia> :a => (x -> uppercase.(x)) => :A
:a => (var"#9#10"() => :A)

to get what you wanted:

julia> df_3 = transform(df_1, :a => (x -> uppercase.(x)) => :A)
5×2 DataFrame
 Row │ a       A
     │ String  String
─────┼────────────────
   1 │ a       A
   2 │ b       B
   3 │ c       C
   4 │ d       D
   5 │ e       E

In this case a more standard way to write it would be:

julia> transform(df_1, :a => ByRow(uppercase) => :A)
5×2 DataFrame
 Row │ a       A
     │ String  String
─────┼────────────────
   1 │ a       A
   2 │ b       B
   3 │ c       C
   4 │ d       D
   5 │ e       E

or even:

julia> transform(df_1, :a => ByRow(uppercase) => uppercase)
5×2 DataFrame
 Row │ a       A
     │ String  String
─────┼────────────────
   1 │ a       A
   2 │ b       B
   3 │ c       C
   4 │ d       D
   5 │ e       E

The last form is new in DataFrames.jl 1.3, which allows you to pass a function as a destination column name specifier (in this case the transformation was to uppercase the source column name). Of course in this case it is longer, but it is sometimes useful if you define transformations programmatically.

Upvotes: 4

Related Questions