DataNoob
DataNoob

Reputation: 205

Python datatypes to pyspark.sql.types auto conversion

I need to create dataframe based on the set of columns names and data types. But data types are given in str, int, float etc.. but I need to convert these to StringType, IntegerType etc.. needed for StructType/StructField.

I can create simple mapping do the job but I like to know if there any automatic conversion of these type?

Upvotes: 0

Views: 3445

Answers (2)

programort
programort

Reputation: 159

You can do that by using the following function:

>>> from pyspark.sql.types import _infer_type
>>> _infer_type([1.0, 2.0])
ArrayType(DoubleType,true)

If you have the type directly in the input you can also do this:

>>> my_type = type(42)
>>> _infer_type(my_type())
LongType

Finally, If you only have a string describing the python type you can use this:

>>> from pydoc import locate
>>> _infer_type(locate('int'))
LongType

Sources:

Upvotes: 1

Priyank Bangar
Priyank Bangar

Reputation: 86

I know it's been long, but you can try the following:

from pyspark.sql.types import _parse_datatype_string

then you can use it as follows:

_parse_datatype_string('int') # Will convert it to IntegerType of pyspark

NOTE: The type has to be in String format

Reference: https://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/sql/types.html

Upvotes: 4

Related Questions