Aishani Singh
Aishani Singh

Reputation: 29

PySpark- getting default column name as "value" in the dataframe

So I have a dataframe, df2 ,which looks like:

enter image description here

I had to convert the values to python float type because of errors-

df2 = spark.createDataFrame([float(x) for x in data],FloatType())

Now maybe due to this I'm getting the default column name as "value" whereas I want the column name to be "Result". I tried renaming the column using the withColumnRenamed() method but it's not working, it's showing the same output. Any idea how I can change the default column name?

Upvotes: 0

Views: 823

Answers (2)

viggnah
viggnah

Reputation: 1877

I think you do withColumnRenamed() but don't assign it to df2:

df2 = df2.withColumnRenamed("value", "Result")

Or during dataframe creation you could pass the name of the column you want:

from pyspark.sql.types import *
schema = StructType([StructField("Result", FloatType(), True)])
df2 = spark.createDataFrame([float(x) for x in data], schema)

Upvotes: 1

Linus
Linus

Reputation: 669

u can try this:

d1= [(0.0,), (0.0,), (0.0,), (5.0,), (57.0,), (142.0,)]
df1 = spark.createDataFrame(d1, 'value float')
df1.printSchema()

# root
#  |-- value: float (nullable = true)

df1.show()
# +-----+
# |value|
# +-----+
# |  0.0|
# |  0.0|
# |  0.0|
# |  5.0|
# | 57.0|
# |142.0|
# +-----+

Upvotes: 0

Related Questions