strivn
strivn

Reputation: 339

Scala Spark - Cannot resolve a column name

This should be pretty straightforward, but I'm having an issue with the following code:

val test = spark.read
    .option("header", "true")
    .option("delimiter", ",")
    .csv("sample.csv")

test.select("Type").show()
test.select("Provider Id").show()

test is a dataframe like so:

Type Provider Id
A asd
A bsd
A csd
B rrr
Exception in thread "main" org.apache.spark.sql.AnalysisException: 
cannot resolve '`Provider Id`' given input columns: [Type, Provider Id];;
'Project ['Provider Id]

It selected and shows the Type column just fine but couldn't get it to work for the Provider Id. I wondered if it were because the column name had a space, so I tried using backticks, removing and replacing the space, but nothing seemed to work. Also, it ran fine when I'm using Spark libraries 3.x but doesn't work when I'm using Spark 2.1.x (meanwhile I need to use 2.1.x)

Additional: I tried changing the CSV column order from Type - Provider Id to Provider Id then Type. The error was the opposite, Provider Id shows but for Type it's throwing an exception now.

Any suggestions?

Upvotes: 1

Views: 1342

Answers (1)

Kateu Herbert
Kateu Herbert

Reputation: 11

test.printSchema()

You can use the result from printSchema() to see how exactly spark read your column in, then use that in your code.

Upvotes: 1

Related Questions