How to convert dense vector to a data frame in pyspark?

Question

I am trying to convert the below dense vector, where i get by taking the coefficients of multiple linear regression models. I want to convert this to a data frame

lr_coefficients = lr_model.coefficients
lr_coefficients.append(lr_coefficients)
lr_coefficients

[DenseVector([-0.0009, -0.2476, 0.5486, 0.396]),
 DenseVector([-0.0016, -1.5333, 0.4269, 0.4363]),
 DenseVector([-0.0492, 0.0, 0.2077, 0.7548]),
 DenseVector([-0.001, -1.2098, 0.545, 0.4148]),
 DenseVector([-0.0001, 0.0, 0.575, 0.3638]),
 DenseVector([-0.001, -1.3361, 0.5402, 0.4113]),
 DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
 DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
 DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
 DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
 DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
 DenseVector([-0.0049, -1.5534, 0.5747, 0.3934])]

I want each coefficient in a column. Like the table below

I have tried the below link but did not work for me.

Convert a Dense Vector to a Dataframe using Pyspark

Raghu · Accepted Answer

Well, you have not mentioned how you have tried. Probably, the problem you have is that you have a list of densevectors. So the toArray() function has to be applied for every element

tst_vct = [DenseVector([6603.0, 332.0, 65.8, -0.19]),
           DenseVector([6613.0, 514.0, 60.7, -0.1238]),
           DenseVector([6708.0, 487.0, 60.6, -0.1481]),
           DenseVector([6446.0, 2538.0, 14.0, -0.0178])]
# Convert each vector to array
tst_arr=[x.toArray().tolist() for x in tst_vct]
# create a dataframe from the list
tst_df= sqlContext.createDataFrame(tst_arr)
tst_df.show()
+------+------+------------------+--------------------+
|    _1|    _2|                _3|                  _4|
+------+------+------------------+--------------------+
|6603.0| 332.0| 65.80000000000001| -0.1900000000000067|
|6613.0| 514.0| 60.70000000000002| -0.1238281250000007|
|6708.0| 487.0|60.600000000000016| -0.1481404958677686|
|6446.0|2538.0|              14.0|-0.01775147928994083|
+------+------+------------------+--------------------+

there you go:-)

How to convert dense vector to a data frame in pyspark?

Answers (1)

Related Questions