SCool
SCool

Reputation: 3375

I can't convert a numpy array to pandas column. Data must be 1 dimensional / Index Error

I have read many suggested solutions but I can't make this work.

I have an array of predictions:

y_prob = best_model.predict_proba(data)

print(y_prob)

array([[0.32],
       [0.5 ],
       [0.32],
       ...,
       [0.46],
       [0.51],
       [0.51]], dtype=float32)

print(y_prob.shape)

(48775, 1)

I have been trying to add this to the original data frame as a column of predictions, but everything I try doesn't work.

# attempt 1

data['probability'] = pd.Series(y_prob)

Exception: Data must be 1-dimensional


# attempt 2

data['probability'] = y_prob

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

# attempt 3

data['probability'] = y_prob.tolist()

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

# attempt 4

data['probability'] = [i[0] for i in y_prob]

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

# attempt etc etc etc.

I know it's probably a stupid mistake .. but I just can't find the solution.

Data dimensions:


print(y_prob.shape)
print(data.shape)

(48775, 1)
(48775, 121)

edit: added suggestions from comments:

dat['probability'] = pd.Series(y_prob.reshape((y_prob.shape[0],)))

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices



data['probability'] = y_prob.ravel()

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices


data['probability'] = pd.Series(y_prob.ravel())

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Upvotes: 2

Views: 2217

Answers (3)

George van Heerden
George van Heerden

Reputation: 43

I tried this and it seemed to work, it's very simple

data['probability'] = list(y_prob)

Upvotes: 1

nimbous
nimbous

Reputation: 1527

Try this:

data['probability'] = pd.Series(y_prob.flatten())

Upvotes: 0

Samuel
Samuel

Reputation: 420

Try

data['probability'] = pd.Series(y_prob.reshape((y_prob.shape[0],)))

This should work.

Upvotes: 1

Related Questions