Preben Brudvik Olsen
Preben Brudvik Olsen

Reputation: 63

Databricks autoloader writing data with invalid characters in column name

when trying to use databricks' autoloader for writing data, the nested columns contain invalid characters

Found invalid character(s) among " ,;{}()\n\t=" in the column names of your schema.

How to deal with this issue? Note again that it is the nested columns, not the outermost columns themselves. The latter would be easily fixed with a

for col in df.columns:
    df = df.select([col(c).alias(re.sub("[^0-9a-zA-Z\_]+","",c)) for c in df.columns])

How do I reach the nested columns, as they're not yet exploded?

Upvotes: 0

Views: 2070

Answers (1)

Christopher Grant
Christopher Grant

Reputation: 101

If you're writing to Delta Lake you can use column mapping to get around this. Specifically, use option("delta.columnMapping.mode", "name").

Upvotes: 0

Related Questions