Reputation: 26149
I have a json document shaped like this (note that this schema isn't under my control - I can't go get rid of the hyphen in the key):
{
"col1": "value1",
"dictionary-a": {
"col2": "value2"
}
}
I use session.read.json(...)
to read this json in to a dataframe (named 'df') like this:
df = session.read.json('/path/to/json.json')
I want to do this:
df2 = df.withColumn("col2", df.dictionary-a.col2)
I get the error:
AttributeError: 'DataFrame' object has no attribute 'dictionary'
How can I reference columns with hyphens in their names in pyspark column expressions?
Upvotes: 2
Views: 2433
Reputation: 43504
As you have it, the hyphen in df.dictionary-a.col2
is being evaluated as subtraction: df.dictionary - a.col2
.
Instead, you can use pyspark.sql.functions.col
to refer to the column by name and pyspark.sql.Column.getItem
to access an element of the dictionary by key.
Try:
from pyspark.sql.functions import col
df2 = df.withColumn("col2", col("dictionary-a").getItem("col2"))
Upvotes: 2