Reputation: 179
I am trying to convert the below dense vector, where i get by taking the coefficients of multiple linear regression models. I want to convert this to a data frame
lr_coefficients = lr_model.coefficients
lr_coefficients.append(lr_coefficients)
lr_coefficients
[DenseVector([-0.0009, -0.2476, 0.5486, 0.396]),
DenseVector([-0.0016, -1.5333, 0.4269, 0.4363]),
DenseVector([-0.0492, 0.0, 0.2077, 0.7548]),
DenseVector([-0.001, -1.2098, 0.545, 0.4148]),
DenseVector([-0.0001, 0.0, 0.575, 0.3638]),
DenseVector([-0.001, -1.3361, 0.5402, 0.4113]),
DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
DenseVector([-0.0049, -1.5534, 0.5747, 0.3934]),
DenseVector([-0.0049, -1.5534, 0.5747, 0.3934])]
I want each coefficient in a column. Like the table below
I have tried the below link but did not work for me.
Convert a Dense Vector to a Dataframe using Pyspark
Upvotes: 1
Views: 321
Reputation: 1712
Well, you have not mentioned how you have tried. Probably, the problem you have is that you have a list of densevectors. So the toArray() function has to be applied for every element
tst_vct = [DenseVector([6603.0, 332.0, 65.8, -0.19]),
DenseVector([6613.0, 514.0, 60.7, -0.1238]),
DenseVector([6708.0, 487.0, 60.6, -0.1481]),
DenseVector([6446.0, 2538.0, 14.0, -0.0178])]
# Convert each vector to array
tst_arr=[x.toArray().tolist() for x in tst_vct]
# create a dataframe from the list
tst_df= sqlContext.createDataFrame(tst_arr)
tst_df.show()
+------+------+------------------+--------------------+
| _1| _2| _3| _4|
+------+------+------------------+--------------------+
|6603.0| 332.0| 65.80000000000001| -0.1900000000000067|
|6613.0| 514.0| 60.70000000000002| -0.1238281250000007|
|6708.0| 487.0|60.600000000000016| -0.1481404958677686|
|6446.0|2538.0| 14.0|-0.01775147928994083|
+------+------+------------------+--------------------+
there you go:-)
Upvotes: 1