Reputation: 53806
This function :
from pyspark.sql import functions as F
lg = F.log(5.2)
from http://spark.apache.org/docs/latest/api/python/pyspark.sql.html
returns :
Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.col. Trace:
py4j.Py4JException: Method col([class java.lang.Double]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
The documentation points to using the function within a dataframe :
>>> df.select(log(10.0, df.age).alias('ten')).rdd.map(lambda l: str(l.ten)[:7]).collect()
['0.30102', '0.69897']
>>> df.select(log(df.age).alias('e')).rdd.map(lambda l: str(l.e)[:7]).collect()
['0.69314', '1.60943']
Should also have ability to use log
function independently on a value ?
Upvotes: 1
Views: 7936
Reputation: 1881
The functions in pyspark.sql
should be used on dataframe columns. These functions expect a column to be passed as parameter. Hence it is looking for a column object with the name that you are passing (5.2 in this case) and hence the error.
For applying log
on any value you should be using math.log
instead
Upvotes: 3