Reputation: 6139
I am trying to generalize schema for creating empty tables in pyspark. My list holds colname and datatype seperated with space.
Below is my code.
I could generalize col name, but it is not able to cast the type.
from pyspark.sql.types import *
tblColumns = [ 'emp_name StringType()'
, 'confidence DoubleType()'
, 'addressType StringType()'
, 'reg StringType()'
, 'inpindex IntegerType()'
]
def createEmptyTable(tblColumns):
structCols = [StructField(colName.split(' ')[0], (colName.split(' ')[1]), True)
for colName in tblColumns]
print('Returning cols', structCols)
return(structCols)
createEmptyTable(tblColumns)
Gives below error.
AssertionError: dataType StringType() should be an instance of <class 'pyspark.sql.types.DataType'>
Is there a way to make datatype as generic
Upvotes: 0
Views: 2425
Reputation: 361
Yes well, it's throwing an error on you because it's a string.
You should cast it somehow by some mapping
so for example
instead of (colName.split(' ')[1])
you should do some mapping table
from pyspark.sql.types import *
datatype = {
'StringType': StringType
...
}
def createEmptyTable(tblColumns):
structCols = [StructField(colName.split(' ')[0], datatype[colName.split(' ')[1]](), True)
for colName in tblColumns]
This way should work, be aware that you will have to declare all the types mapping.
Upvotes: 1