Alexander Engelhardt
Alexander Engelhardt

Reputation: 1712

Where are the PySpark docs' DataFrames df, df2, df3 etc. defined?

In the PySpark docs, I see many examples working on sample DataFrames like df4 here.

Where are they defined? I'd like to see them in full to better understand the docs.

Upvotes: 2

Views: 26

Answers (1)

notNull
notNull

Reputation: 31460

They are defined in _test() method in Class GroupedData(...)

enter image description here

from pyspark.sql import Row

df4 = sc.parallelize([Row(course="dotNET", year=2012, earnings=10000),
                                   Row(course="Java",   year=2012, earnings=20000),
                                   Row(course="dotNET", year=2012, earnings=5000),
                                   Row(course="dotNET", year=2013, earnings=48000),
                                   Row(course="Java",   year=2013, earnings=30000)]).toDF()

Upvotes: 2

Related Questions