iNoob
iNoob

Reputation: 1395

Pandas dataframe creation from a list

Im getting the following error Shape of passed values is (1, 5), indices imply (5, 5). From what I can tell this suggests that the data set doesnt match the column count, and of course it obviously is correct. Initially I thought it could be due to using a list, but I get the same issue if passing in a numpy array.

Can anyone point out my stupidity, as im clearly doing something incorrectly.

data = ['data1', 'data2', 'data3', 'data4', 'data5']
report_name = 'test.csv'
try:
    df = pd.DataFrame(data, columns=['column1', 'column2', 'column3', 'column4', 'column5'], index=None)
    df.sort_values('column1', ascending=True, inplace=True)
    df.to_csv(report_name, index=False)
except Exception, e:
    print e

Upvotes: 1

Views: 61

Answers (2)

Shivam Gaur
Shivam Gaur

Reputation: 1062

You've missed the list brackets around data

df = pd.DataFrame(data = [data], columns=['column1', 'column2', 'column3', 'column4', 'column5'], index=None)

Things to note: pd.DataFrame() expects a list of tuples, this means:

data = ['data1', 'data2', 'data3', 'data4', 'data5']
df = pd.DataFrame(data)
# This implies every element in the list `data` is a tuple 
print(df)

Out[]:        0
         0  data1
         1  data2
         2  data3
         3  data4
         4  data5

As opposed to :

data = ['data1', 'data2', 'data3', 'data4', 'data5']
df = pd.DataFrame([data])
# This implies that the list `data` is the first tuple
print(df)
Out[]:        0      1      2      3      4
         0  data1  data2  data3  data4  data5

Upvotes: 0

JMat
JMat

Reputation: 737

you have to pass a 2d dimensional array to pd.DataFrame for the data if you force the shape by passing columns

data = [['data1', 'data2', 'data3', 'data4', 'data5']]
df = pd.DataFrame(data, columns=['column1', 'column2', 'column3', 'column4', 'column5'])

Upvotes: 1

Related Questions