Reputation: 3884
I'm new to Spark and I'm trying to figure out what is the procedure for performing data science using it. Concretely, I know how to create Dataframes out of existing data and then perform some analysis.
Now I'm trying to understand how to run ML algorithms on data already in dataframes. When I look at ML documentation, I see that Dataframes are created out of Vectors (dense or sparse), but as that is not the case with my existing dataframes. I was wondering how to convert existing dataframe with a number of columns into a dataframe with single column placed in vectors?
What is the usual procedure when trying to perform exploratory analysis and some plots first and then perform ML on same dataframe?
Upvotes: 0
Views: 47
Reputation: 1712
org.apache.spark.ml.feature
/ pyspark.ml.feature
contains a large number of feature extraction tools which are extensively documented (Extracting, transforming and selecting features)Upvotes: 1