Sam Berchmans
Sam Berchmans

Reputation: 127

Check if a column exists in DF - Java Spark

I am trying to check if there is any method to see if a particular column exists in a Dataframe, and check it using Java Spark. I searched and ended up with suggestions related to Python, but nothing related to Java.

i am extracting this data from Mongo and trying to check certain columns exist or not. There is no schema validation available in the mongo db for this table.

The following is my Schema and i would like to check if they exist with my config of columns.

 |-- _id: string (nullable = true)
 |-- value: struct (nullable = true)
 |    |-- acctId: string (nullable = true)
 |    |-- conId: string (nullable = true)
 |    |-- dimensions: struct (nullable = true)
 |    |    |-- device: struct (nullable = true)
 |    |    |    |-- accountId: long (nullable = true)
 |    |    |    |-- addFreeTitleTime: timestamp (nullable = true)
 |    |    |    |-- build: string (nullable = true)
 |    |    |    |-- country: string (nullable = true)
 |    |    |    |-- countryOfResidence: string (nullable = true)
 |    |    |    |-- createDate: timestamp (nullable = true)
 |    |    |    |-- number: string (nullable = true)
 |    |    |    |-- FamilyName: string (nullable = true)
 |    |    |    |-- did: long (nullable = true)
 |    |    |    |-- deviceToken: string (nullable = true)
 |    |    |    |-- initialBuildNumber: string (nullable = true)
 |    |    |    |-- language: string (nullable = true)
 |    |    |    |-- major: integer (nullable = true)
 |    |    |    |-- minor: integer (nullable = true)
 |    |    |    |-- model: string (nullable = true)
 |    |    |    |-- modelDesc: string (nullable = true)
 |    |    |    |-- modelId: string (nullable = true)
 |    |    |    |-- modifyDate: timestamp (nullable = true)
 |    |    |    |-- preReg: integer (nullable = true)
 |    |    |    |-- retailer: string (nullable = true)
 |    |    |    |-- serialNumber: string (nullable = true)
 |    |    |    |-- softwareUpdateDate: timestamp (nullable = true)
 |    |    |    |-- softwareVersion: string (nullable = true)
 |    |    |    |-- sourceId: string (nullable = true)
 |    |    |    |-- timeZone: string (nullable = true)
 |    |    |-- location: struct (nullable = true)

Your inputs and suggestions would be of great value.

Thanks in Advance

Upvotes: 0

Views: 1808

Answers (2)

mvasyliv
mvasyliv

Reputation: 1214

sourceDF.printSchema
//  root
//  |-- category: string (nullable = true)
//  |-- tags: string (nullable = true)
//  |-- datetime: string (nullable = true)
//  |-- date: string (nullable = true)

  val cols = sourceDF.columns
//  cols: Array[String] = Array(category, tags, datetime, date)

  val IsFieldCategory = cols.filter(_ == "category")
//  IsFieldCategory: Array[String] = Array(category)

or

val isFieldTags = sourceDF.columns.contains("tags")
//  isFieldTags: Boolean = true

Upvotes: 2

Neha Kumari
Neha Kumari

Reputation: 787

Yes, you can achieve this in Java by fetching all the columns of a Dataset and checking if the column you want exists or not. Giving sample example here :

Dataset<Object1> dataSet = spark.read().text("dataPath").as(Encoders.bean(Object1.class)); //load data in dataset
String[] columns = dataSet.columns(); // fetch all column names
System.out.println(Arrays.toString(columns).contains("columnNameToCheckFor")); //check if the column name we want to check exist in the array of columns.

Here I have used a very naive method to check if the column name exist in the array of columns, you can use any other method to perform this check.

Upvotes: 2

Related Questions