Utku Şenel
Utku Şenel

Reputation: 31

Performing logistic regression analysis in python using sklearn

I am trying to perform a logistic regression analysis but I don't know which part am i mistaken in my code. It gives error on the line logistic_regression.fit(X_train, y_train). But it seems okay as i checked from different sources. Can anybody help? Here is my code:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

df = pd.read_csv("/Users/utkusenel/Documents/Data Analyzing/data.csv", header=0, sep=";")
data = pd.DataFrame(df)

x = data.drop(columns=["churn"])  #features
y = data.churn  # target variable
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
logistic_regression = LogisticRegression()
logistic_regression.fit(X_train, y_train)

Upvotes: 0

Views: 141

Answers (1)

Farid Jafri
Farid Jafri

Reputation: 356

There are multiple problems here.

  1. Your first row of headers has a ';' at the end. So it is going to read an extra column. You need to remove that ';' after churn.
  2. The training data that you are trying to use here, X_train, is going to have multiple text/categorical columns. You need to convert these into numbers. Check out OneHotEncoder here: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html and LabelEncoder here: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

After you have converted your text and categorical data to numbers and removed the extra ';' separator, run your algorithm again.

Upvotes: 2

Related Questions