snnmst
snnmst

Reputation: 43

Can I apply "classification" first and then "regression" to the same data set?

I am a beginner in data science and need help with a topic.

I have a data set about the customers of an institution. My goal is to first find out which customers will pay to this institution and then find out how much money the paying customers will pay.

In this context, I think that I can first find out which customers will pay by "classification" and then how much will pay by applying "regression".

So, first I want to apply "classification" and then apply "regression" to this output. How can I do that?

Upvotes: 2

Views: 2725

Answers (3)

Marc
Marc

Reputation: 2410

Sure, you can definitely apply a classification method followed by regression analysis. This is actually a common pattern during exploratory data analysis.

For your use case, based on the basic info you are sharing, I would intuitively go for 1) logistic regression and 2) multiple linear regression.

Logistic regression is actually a classification tool, even though the name suggests otherwise. In a binary logistic regression model, the dependent variable has two levels (categorical), which is what you need to predict if your customers will pay vs. will not pay (binary decision)

The multiple linear regression, applied to the same independent variables from your available dataset, will then provide you with a linear model to predict how much your customers will pay (ie. the output of the inference will be a continuous variable - the actual expected dollar value).

That would be the approach I would recommend to implement, since you are new to this field. Now, there are obviously many different other ways to define these models, based on available data, nature of the data, customer requirements and so on, but the logistic + multiple regression approach should be a sure bet to get you going.

Upvotes: 2

Sam Oz
Sam Oz

Reputation: 124

Of course you can do it by vertically stacking models. Assuming that you are using binary classification, after prediction you will have a dataframe with target values 0 and 1. You are going to filter where target==1 and create a new dataframe. Then run the regression.

Also, rather than classification, you can use clustering if you don't have labels since the cost is lower.

Upvotes: 1

Nikaido
Nikaido

Reputation: 4629

Another approach would be to make it a pure regression only. Without working on a cascade of models. Which will be more simple to handle

For example, you could associate to the people that are not willing to pay the value 0 to the spended amount, and fit the model on these instances.

For the business, you could then apply a threshold in which if the predicted amount is under a more or less fixed threshold, you classify the user as "non willing to pay"

Upvotes: 1

Related Questions