Databricks autoloader writing data with invalid characters in column name

Question

when trying to use databricks' autoloader for writing data, the nested columns contain invalid characters

Found invalid character(s) among " ,;{}()
	=" in the column names of your schema.

How to deal with this issue? Note again that it is the nested columns, not the outermost columns themselves. The latter would be easily fixed with a

for col in df.columns:
    df = df.select([col(c).alias(re.sub("[^0-9a-zA-Z\_]+","",c)) for c in df.columns])

How do I reach the nested columns, as they're not yet exploded?

Answers (1)