Reputation: 7557
I had some Pyspark code that was working with a sample csv BLOB and then I decided to point it to a bigger dataset. This line:
df= df.withColumn("TransactionDate", df["TransactionDate"].cast(TimestampType()))
In now throwing this error:
AnalysisException: u'Cannot resolve column name "TransactionDate" among ("TransactionDate","Country ...
Clearly TransactionDate exists as a column in the dataset so why is it suddenly not working?
Upvotes: 0
Views: 734
Reputation: 7557
Ah ok I figured it out. If you get this issue check your delimiter. In my new dataset it was "," where as in my smaller sample is was "|"
df = spark.read.format(file_type).options(header='true', quote='"', delimiter=",",ignoreLeadingWhiteSpace='true',inferSchema='true').load(file_location)
Upvotes: 1