Reputation: 11
I keep having error messages anytime I try running CoxPH regression in Python. I'm not a pro in python still learning.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
from lifelines import KaplanMeierFitter
from lifelines.statistics import multivariate_logrank_test
from lifelines.statistics import logrank_test
from lifelines import CoxPHFitter
import pyreadstat
After loading the data
data["faculty2"] = data["faculty2"].astype(int)
data["sex"] = data["sex"].astype(int)
data["mos"] = data["mos"].astype(int)
data["state2"] = data["state2"].astype(int)
data["ss"] = data["ss"].astype(int)
data["supervisor"] = data["supervisor"].astype(int)
data["time"] = data["time"].astype(int)
data["event"] = data["event"].astype(int)
Eventvar = data['event']
Timevar = data['time']
""" assigning labels to values"""
data['sex'] = data['sex'].apply({1:'Male', 0:'female'}.get)
data['faculty2'] = data['faculty2'].apply({1:'Arts',2:'Sciences',3:'Medicals',\
4:'Agriculture', 5:'Social Sciences',6:'Education',\
7:'Tech',8:'Law',9:'Institues'}.get)
data['state2'] = data['state2'].apply({1:'SW',2:'SS',3:'SE',4:'NC', 5:'NE',6:'NW'}.get)
data['ss'] = data['ss'].apply({1:'Yes', 0:'No'}.get)
data['mos'] = data['mos'].apply({1:'Full Time', 0:'Part Time'}.get)
cf = CoxPHFitter()
cf.fit(data, 'time', event_col='event',show_progress=True)
cf.print_summary()
I get this error message when i run these codes
ValueError: could not convert string to float: 'Arts'
Please I need help I don't know how to go about this If I add dummies i have a different error message
ohe_features = ['faculty2', 'sex', 'mos','state2','ss']
data = pd.get_dummies(data,drop_first=True,columns=ohe_features)
This is the error message I get
ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular
If i run the codes without assigning values to labels and without adding dummies it runs but the different levels are not showing. It runs as though it were continuous variables
Upvotes: 1
Views: 4290
Reputation: 1
I had the pretty identical problem. I changed
cph = CoxPHFitter()
to
cph = CoxPHFitter(penalizer=0.0001)
This solved the issue.
Upvotes: 0
Reputation: 21
In the lifelines documentation they suggest
Upvotes: 1