Reputation: 7894
I'm calling the following:
propertiesDF.select(
col("timestamp"), col("coordinates")(0) as "lon",
col("coordinates")(1) as "lat",
col("properties.tide (above mllw)") as "tideAboveMllw",
col("properties.wind speed") as "windSpeed")
This gives me the following error:
org.apache.spark.sql.AnalysisException: No such struct field tide (above mllw) in air temperature, atmospheric pressure, dew point, dominant wave period, mean wave direction, name, program name, significant wave height, tide (above mllw):, visibility, water temperature, wind direction, wind speed;
Now there definitely is such a struct field. (The error message itself says so.)
Here is the schema:
root
|-- timestamp: long (nullable = true)
|-- coordinates: array (nullable = true)
| |-- element: double (containsNull = true)
|-- properties: struct (nullable = true)
| |-- air temperature: double (nullable = true)
| |-- atmospheric pressure: double (nullable = true)
| |-- dew point: double (nullable = true)
.
.
.
| |-- tide (above mllw):: string (nullable = true)
.
.
.
The input is read as JSON like this:
val df = sqlContext.read.json(dirName)
How do I handle parentheses in a column name?
Upvotes: 0
Views: 1620
Reputation: 330373
You should avoid names like this in the first place but you can either split access path:
val df = spark.range(1).select(struct(
lit(123).as("tide (above mllw)"),
lit(1).as("wind speed")
).as("properties"))
df.select(col("properties").getItem("tide (above mllw)"))
// or
df.select(col("properties")("tide (above mllw)"))
or enclose problematic field with backticks:
df.select(col("properties.`tide (above mllw)`"))
Both solutions assume data your data contains following structure (based on the access path you use for queries):
df.printSchema
// root
// |-- properties: struct (nullable = false)
// | |-- tide (above mllw): integer (nullable = false)
// | |-- wind speed: integer (nullable = false)
Upvotes: 2
Reputation: 5049
Based on the documentation you might try with single quotes. Like this:
propertiesDF.select(
col("timestamp"), col("coordinates")(0) as "lon",
col("coordinates")(1) as "lat",
col("'properties.tide (above mllw)'") as "tideAboveMllw",
col("properties.wind speed") as "windSpeed")
Upvotes: 0