Apache-spark dataframe column names are inconsistent, why does this happen?

Question

Doing something similar to the sql-programming-guid on the Apache-Spark site, the columns produced by my Java bean class don't match up in case sensitivity. Some start with first letter capitalized and others don't with no consistency or pattern.

There are some things I have done different to that guide which is:

private members are named "mName", rather than just "name" and their getter/setter is getName setName.
Datatypes that I use are Integer, String and Timestamp.

So,

By reflection, how exactly is it getting the names? Does it use the get/set function names and truncate off the get and set parts?
Is there a way to disable case sensitivity?

As for why I'm not showing any of my code. It's for work, so I want to avoid showing anything I shouldn't.

--UPDATE-------------------------

So it looks like the name is based on the get and set functions. Changing set/getStartTime to set/getStartTimee resulted in startTime becoming startTimee. However I still get the case where I have a column ITrN for get/setITrN that keeps it's upper case first letter but a column like startTime that doesn't.

--UPDATE #2-------------------------

After playing around with the names, it looks like the deciding factor is if Spark thinks the name is an acronym(all caps), a word, or a single letter. If it starts with a word or a single letter, it'll make it lower case. As a workaround I just started everything with "_". Anyway... if anyone knows how to disable case sensitivity when querying, let me know.

Apache-spark dataframe column names are inconsistent, why does this happen?

Answers (1)

Related Questions