Alon
Alon

Reputation: 11935

Creating dataframe with complex schema that includes MapType in pyspark

I'm trying to create a dataframe with the following schema:

|-- data: struct (nullable = true)
 |    |-- id: long (nullable = true)
 |    |-- keyNote: struct (nullable = true)
 |    |    |-- key: string (nullable = true)
 |    |    |-- note: string (nullable = true)
 |    |-- details: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)

This is the best I managed to do:

schema = StructType([
    StructField("id",LongType(), True),
    StructField("keyNote",StructType([
            StructField("key",StringType(),True),
            StructField("note",StringType(),True)
        ])),
    StructField("details",MapType(StringType, StringType, True))
    ])

df = spark\
    .createDataFrame([("idd",("keyy","notee"),("keyy","valuee")),schema])

But I'm getting an exception:

AssertionError: keyType should be DataType

Upvotes: 0

Views: 589

Answers (1)

Enayat
Enayat

Reputation: 4052

Seems you should write the correct syntax for the MapType:

MapType(StringType(), StringType(), True)

Instead of StringType(), you wrote StringType without parentheses.

Upvotes: 1

Related Questions