Mohit Rane
Mohit Rane

Reputation: 279

Import Schema on pyspark dataframe

I am new to python. I am trying to read a JSON file that contains my schema definition. It looks like :

{
  "type" : "struct",
  "fields" : [ {
    "name" : "name",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "address",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }, {
    "name" : "comment",
    "type" : "string",
    "nullable" : true,
    "metadata" : { }
  }
}

I have a data set, and on that i need to apply above json schema, I have tried below code :

targetDf = spark.createDataFrame(inputDf.rdd, schemaFieldsOne)

However, here I need to specify the 'schemaFieldsOne' a struct type, I want to read the JSON and convert it into Python struct type so that I can apply that StructType to my data frame(.to add).

Upvotes: 0

Views: 2043

Answers (1)

thePurplePython
thePurplePython

Reputation: 2767

try this

import pyspark.sql.types as T
import pyspark.sql.functions as F

with open('./schema.txt', 'r') as S:  # path to your schema file
    saved_schema = json.load(S)

schema = T.StructType.fromJson(json.loads(saved_schema))

df = spark.createDataFrame(yourRdd, schema)

Upvotes: 1

Related Questions