Sushant Bharti
Sushant Bharti

Reputation: 149

Count number of columns in pyspark Dataframe?

I have a dataframe with 15 columns (4 categorical and the rest numeric).

I have created dummy variables for every categorical variable. Now I want to find the number of variables in my new dataframe.

I tried calculating length of printSchema(), but is NoneType:

print type(df.printSchema())

Upvotes: 9

Views: 53483

Answers (1)

Rakesh Kumar
Rakesh Kumar

Reputation: 4420

You are finding it wrong way, Here is sample example for this and about printSchema:-

df = sqlContext.createDataFrame([
    (1, "A", "X1"),
    (2, "B", "X2"),
    (3, "B", "X3"),
    (1, "B", "X3"),
    (2, "C", "X2"),
    (3, "C", "X2"),
    (1, "C", "X1"),
    (1, "B", "X1"),
], ["ID", "TYPE", "CODE"])


# Python 2:
print len(df.columns) #3
# Python 3
print(len(df.columns)) #3

columns provides list of all columns and we can check len. Instead printSchema prints schema of df which have columns and their data type, ex below:-

root
 |-- ID: long (nullable = true)
 |-- TYPE: string (nullable = true)
 |-- CODE: string (nullable = true)

Upvotes: 23

Related Questions