user2774120
user2774120

Reputation: 147

Error when creating dataframe from two columns in Python panda

when i try to create dataframe from two columns i.e. pids and SalePrice I get error "Exception: Data must be 1-dimensional". I think the error is coming because these two data series are in different format like below. Please help how can i make these data series same

ksubmission = pd.DataFrame({'Id':pids,'SalePrice':predictions_kaggle})

Exception: Data must be 1-dimensional

pids.shape

(1459,)

predictions_kaggle.shape

(1459, 1)

predictions_kaggle is in below format

array([[115901.20520943],
       [144313.70246636],
       [165320.94012928],
       ...,
       [155759.14767572],
       [111175.64223766],
       [249104.99042467]])

while pids is in below format

0       1461
1       1462
2       1463
3       1464
4       1465
        ... 
1454    2915
1455    2916
1456    2917
1457    2918
1458    2919
Name: Id, Length: 1459, dtype: int64

Upvotes: 1

Views: 139

Answers (2)

Seraph Wedd
Seraph Wedd

Reputation: 864

The problem here is that your predictions_kaggle array is not a 1-D array but rather a 2-D one. As proof, the shape of a 1-D array should be in the form (n,) but instead you have (n,1) which indicates that each line of your array is a single value inside an array. A quick fix to this is by flattening the array, which will turn it into a 1-D array:

ksubmission = pd.DataFrame({'Id':pids,'SalePrice':predictions_kaggle.flatten()})

Hope this helps.

Upvotes: 1

oppressionslayer
oppressionslayer

Reputation: 7224

I think you need to do this if the lengths are the same:

import pandas as pd
import numpy as np
pd.DataFrame(predictions_kaggle, index=pids).reset_index().rename(columns={'index': 'Id', 0:'SalePrice'}) 

or

pd.DataFrame({'Id':pids,'SalePrice':np.ndarray.flatten(predictions_kaggle)}) 

Upvotes: 1

Related Questions