PySpark- How to Calculate Min, Max value of each field using Pyspark?

Question

I am trying to find the min , max of each field resulted from the sql statement and write it to a csv file. I am trying to get the result in below fashion. Could you please help. I already have written in python but now trying to convert it to pyspark to run in hadoop cluster directly

from pyspark.sql.functions import max, min, mean, stddev
from pyspark import SparkContext
sc =SparkContext()
from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
#bank = hive_context.table("cip_utilities.file_upload_temp")
data=hive_context.sql("select * from cip_utilities.cdm_variables_dict")
hive_context.sql("describe cip_utilities.cdm_variables_dict").registerTempTable("schema_def")
temp_data=hive_context.sql("select * from schema_def")
temp_data.show()
data1=hive_context.sql("select col_name from schema_def where data_type<>'string'")
colum_names_as_python_list_of_rows = data1.collect()
#data1.show()
for line in colum_names_as_python_list_of_rows:
        #print value in MyCol1 for each row                
        ---Here i need to calculate min, max, mean etc for this particular field send by the for loop

PySpark- How to Calculate Min, Max value of each field using Pyspark?

Answers (1)

Related Questions