Reputation: 149
I have a dataframe with 15 columns (4 categorical and the rest numeric).
I have created dummy variables for every categorical variable. Now I want to find the number of variables in my new dataframe.
I tried calculating length of printSchema()
, but is NoneType
:
print type(df.printSchema())
Upvotes: 9
Views: 53483
Reputation: 4420
You are finding it wrong way, Here is sample example for this and about printSchema:-
df = sqlContext.createDataFrame([
(1, "A", "X1"),
(2, "B", "X2"),
(3, "B", "X3"),
(1, "B", "X3"),
(2, "C", "X2"),
(3, "C", "X2"),
(1, "C", "X1"),
(1, "B", "X1"),
], ["ID", "TYPE", "CODE"])
# Python 2:
print len(df.columns) #3
# Python 3
print(len(df.columns)) #3
columns
provides list of all columns and we can check len. Instead printSchema
prints schema of df which have columns and their data type, ex below:-
root
|-- ID: long (nullable = true)
|-- TYPE: string (nullable = true)
|-- CODE: string (nullable = true)
Upvotes: 23