Reputation: 63
when trying to use databricks' autoloader for writing data, the nested columns contain invalid characters
Found invalid character(s) among " ,;{}()\n\t=" in the column names of your schema.
How to deal with this issue? Note again that it is the nested columns, not the outermost columns themselves. The latter would be easily fixed with a
for col in df.columns:
df = df.select([col(c).alias(re.sub("[^0-9a-zA-Z\_]+","",c)) for c in df.columns])
How do I reach the nested columns, as they're not yet exploded?
Upvotes: 0
Views: 2070
Reputation: 101
If you're writing to Delta Lake you can use column mapping to get around this. Specifically, use option("delta.columnMapping.mode", "name")
.
Upvotes: 0