Reputation: 11
So, I have labeled tweets as retweeted or not retweeted and I have to use logistic regression to build a model to predict whether a tweet will be retweeted or not.
The problem I am facing is I don't know how to use multiple featured with logistic regression. The features I have to use are tf-idf, lda, whether a tweet has been retweeted, how many time tweets from a certain user have been retweeted in the past.
How can I use 4 features in binary classification? Any help would be greatly appreciated.
Upvotes: 1
Views: 2088
Reputation: 3514
Heres just an example using the clasiffier default parameters, the idea is that the same procedure is used if you have two, or if you have more features:
dataset = np.ndarray(shape=(num_rows,3),dtype=np.float32) ;
retweeted_output = np.ndarray(shape=(num_rows,1),dtype=np.float32)
#perform some actions to fill your data structures
model = LogisticRegression();
model.fit(dataset,retweeted_output);
Upvotes: 2