Manually create dataframe with date column

Question

I am reading an example code from pyspark documentation

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SQLContext

In an example code, it creates a dataframe like this

df = spark.createDataFrame([('2015-04-08',)], ['dt'])
df.select(add_months(df.dt, 1).alias('next_month')).collect()
[Row(next_month=datetime.date(2015, 5, 8))]

I am wondering why there must be a comma after '2015-04-08' while there is only one column. I feel it may has something to do with tuple type, but would like to learn more.

Shantanu Sharma · Accepted Answer

Single element tuple has additional comma(',') to distinguish them with the arithmetic expression (1). Below example should give more clarity.

Airthmetic expresion:

a = (1)
type(a)
#int

Tuple with single element :

b = (1,)
type(b)
#tuple

you can define zero element tuple with empty brackets.

zero_element_tuple = ()
type(zero_element_tuple)
#tuple

Only single element tuple require additional comma (',') to distinguish them with arithmetic expression, multiple elements tuple does not require additional comma at end.

Manually create dataframe with date column

Answers (1)

Related Questions