How to find cells with None values (string data-type) and replace them with Null in Spark DataFrame?

Question

I have a large dataset and some columns have String data-type. Because of typo mistake, some of the cells have None values but written in different styles (with small or capital letters, with or without space, with or without bracket, etc).

I want to find all those values and convert them into null. A sample dataset is below:

data = [("A", "None", 1), \
    ("A", "(None)", 2), \
    ("A", "none", 3), \
    ("A", "[None]", 4), \
    ("A", "(none)", 5), \
    ("A", "(none", 6), \
    ("A", "none  ", 7), \
    (" NOne  ", "B", 8), \
  ]

# Create DataFrame
columns= ["col_1", "col_2", "Number"]
df = spark.createDataFrame(data = data, schema = columns)

Any idea how to do that?

How to find cells with None values (string data-type) and replace them with Null in Spark DataFrame?

Answers (1)

Related Questions