Running ML algorithms on existing dataframes

Question

I'm new to Spark and I'm trying to figure out what is the procedure for performing data science using it. Concretely, I know how to create Dataframes out of existing data and then perform some analysis.

Now I'm trying to understand how to run ML algorithms on data already in dataframes. When I look at ML documentation, I see that Dataframes are created out of Vectors (dense or sparse), but as that is not the case with my existing dataframes. I was wondering how to convert existing dataframe with a number of columns into a dataframe with single column placed in vectors?

What is the usual procedure when trying to perform exploratory analysis and some plots first and then perform ML on same dataframe?

Running ML algorithms on existing dataframes

Answers (1)

Related Questions