Reputation: 5660
I am using Spark version 2.1 in Databricks. I have a data frame named wamp
to which I want to add a column named region
which should take the constant value NE
. However, I get an error saying NameError: name 'lit' is not defined
when I run the following command:
wamp = wamp.withColumn('region', lit('NE'))
What am I doing wrong?
Upvotes: 13
Views: 51979
Reputation: 14067
you need to import lit
either
from pyspark.sql.functions import *
will make lit
available
or something like
import pyspark.sql.functions as sf
wamp = wamp.withColumn('region', sf.lit('NE'))
Upvotes: 34
Reputation: 1397
muon@ provided the correct answer above. Just adding a quick reproducible version to increase clarity.
>>> from pyspark.sql.functions import lit
>>> df = spark.createDataFrame([(1, 4, 3)], ['a', 'b', 'c'])
>>> df.show()
+---+---+---+
| a| b| c|
+---+---+---+
| 1| 4| 3|
+---+---+---+
>>> df = df.withColumn("d", lit(5))
>>> df.show()
+---+---+---+---+
| a| b| c| d|
+---+---+---+---+
| 1| 4| 3| 5|
+---+---+---+---+
Upvotes: 4