Reddspark
Reddspark

Reputation: 7557

Pyspark Cannot resolve column name when Column does exist

I had some Pyspark code that was working with a sample csv BLOB and then I decided to point it to a bigger dataset. This line:

df= df.withColumn("TransactionDate", df["TransactionDate"].cast(TimestampType()))

In now throwing this error:

AnalysisException: u'Cannot resolve column name "TransactionDate" among ("TransactionDate","Country ...

Clearly TransactionDate exists as a column in the dataset so why is it suddenly not working?

Upvotes: 0

Views: 734

Answers (1)

Reddspark
Reddspark

Reputation: 7557

Ah ok I figured it out. If you get this issue check your delimiter. In my new dataset it was "," where as in my smaller sample is was "|"

df = spark.read.format(file_type).options(header='true', quote='"', delimiter=",",ignoreLeadingWhiteSpace='true',inferSchema='true').load(file_location)

Upvotes: 1

Related Questions