pingboing
pingboing

Reputation: 69

Manually create dataframe with date column

I am reading an example code from pyspark documentation

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext

In an example code, it creates a dataframe like this

df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(add_months(df.dt, 1).alias('next_month')).collect()
[Row(next_month=datetime.date(2015, 5, 8))]

I am wondering why there must be a comma after '2015-04-08' while there is only one column. I feel it may has something to do with tuple type, but would like to learn more.

Upvotes: 2

Views: 1776

Answers (1)

Shantanu Sharma
Shantanu Sharma

Reputation: 4089

Single element tuple has additional comma(',') to distinguish them with the arithmetic expression (1). Below example should give more clarity.

Airthmetic expresion:

a = (1)
type(a)
#int

Tuple with single element :

b = (1,)
type(b)
#tuple

you can define zero element tuple with empty brackets.

zero_element_tuple = ()
type(zero_element_tuple)
#tuple

Only single element tuple require additional comma (',') to distinguish them with arithmetic expression, multiple elements tuple does not require additional comma at end.

Upvotes: 1

Related Questions