Reputation: 346
I am trying to add some computed columns to a SparkR data frame, as follows:
Orders <- withColumn(Orders, "Ready.minus.In.mins",
(unix_timestamp(Orders$ReadyTime) - unix_timestamp(Orders$InTime)) / 60)
Orders <- withColumn(Orders, "Out.minus.In.mins",
(unix_timestamp(Orders$OutTime) - unix_timestamp(Orders$InTime)) / 60)
The first command executes ok, and head(Orders)
reveals the new column. The second command throws the error:
15/12/29 05:10:02 ERROR RBackendHandler: col on 359 failed
Error in select(x, x$"*", alias(col, colName)) :
error in evaluating the argument 'col' in selecting a method for function
'select': Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
org.apache.spark.sql.AnalysisException: Cannot resolve column name
"Ready.minus.In.mins" among (ASAP, AddressLine, BasketCount, CustomerEmail, CustomerID, CustomerName, CustomerPhone, DPOSCustomerID, DPOSOrderID, ImportedFromOldDb, InTime, IsOnlineOrder, LineItemTotal, NetTenderedAmount, OrderDate, OrderID, OutTime, Postcode, ReadyTime, SnapshotID, StoreID, Suburb, TakenBy, TenderType, TenderedAmount, TransactionStatus, TransactionType, hasLineItems, Ready.minus.In.mins);
at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:159)
at org.apache.spark.sql.DataFrame$$anonfun$resolve$1.apply(DataFrame.scala:159)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:158)
at org.apache.spark.sql.DataFrame$$anonfun$col$1.apply(DataFrame.scala:650)
at org.apa
Do I need to do something to the data frame after adding the new column before it will accept another one?
Upvotes: 0
Views: 1912
Reputation: 78
From the link, just use backsticks, when accessing the column, e.g.:
From using
df['Fields.fields1']
or something, use:
df['`Fields.fields1`']
Upvotes: 1
Reputation: 346
Found it here: spark-issues mailing list archives
SparkR isn't entirely happy with "." in a column name.
Upvotes: 0