Reputation: 339
This should be pretty straightforward, but I'm having an issue with the following code:
val test = spark.read
.option("header", "true")
.option("delimiter", ",")
.csv("sample.csv")
test.select("Type").show()
test.select("Provider Id").show()
test
is a dataframe like so:
Type | Provider Id |
---|---|
A | asd |
A | bsd |
A | csd |
B | rrr |
Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve '`Provider Id`' given input columns: [Type, Provider Id];;
'Project ['Provider Id]
It selected and shows the Type
column just fine but couldn't get it to work for the Provider Id
. I wondered if it were because the column name had a space, so I tried using backticks, removing and replacing the space, but nothing seemed to work. Also, it ran fine when I'm using Spark libraries 3.x but doesn't work when I'm using Spark 2.1.x (meanwhile I need to use 2.1.x)
Additional: I tried changing the CSV column order from Type
- Provider Id
to Provider Id
then Type
. The error was the opposite, Provider Id
shows but for Type
it's throwing an exception now.
Any suggestions?
Upvotes: 1
Views: 1342
Reputation: 11
test.printSchema()
You can use the result from printSchema()
to see how exactly spark read your column in, then use that in your code.
Upvotes: 1