Abhishek Singh
Abhishek Singh

Reputation: 35

List vs "List of List" as input in pd.DataFrame()

I am trying to figure out why the command df = pd.DataFrame([1,2,3]) gives output as 1, 2, 3 in a column, while, when I provide df = pd.DataFrame([[1,2,3]]), it gives output as 1, 2, 3 in a row.

If pd.DataFrame([1,2,3]) treats the list [1,2,3] as a series and puts it under a column then why doesn't pd.DataFrame([[1,2,3],[4,5,6]]) take the two lists as two series and create a DataFrame with two columns?

Instead, it puts the two lists as rows!

Screenshot for reference:

enter image description here

Upvotes: 1

Views: 138

Answers (1)

Akshay Sehgal
Akshay Sehgal

Reputation: 19322

Its quite easy to understand.

First case - [1,2,3] has 3 objects (each as a single int). Pandas therefore creates 3 rows to hold it. Since each of the objects is a single int, only 1 column/feature is added.

Second case - [[1,2,3],[4,5,6]] is a list of lists. It contains 2 objects, each of which is a 3 element list. Therefore, pandas creates 2 rows to hold the 2 objects. Since each of the objects is a 3 element list, it creates 3 columns/features to store them.

Third case - [[1,2,3]] is a list containing a single object. This object is a 3 length object. So, as before, pandas will create a single row but 3 columns to store it.

Fourth case - (As @Scott posted in his answer) [[[1,2,3]],[[4,5,6]]] is a list of list of list. It contains 2 objects, each of which is a list of list. So there will be 2 rows. However, since each of the objects contains a single item in this [[1,2,3]] has only [1,2,3], there will only be a single column and each entry will hold a list!

An easier way to understand this is simply by using numpy.shape.

a = [1,2,3]
b = [[1,2,3], [4,5,6]]
c = [[1,2,3]]
d = [[[1,2,3]],[[4,5,6]]]

print(np.array(a).shape, pd.DataFrame(a).shape)
print(np.array(b).shape, pd.DataFrame(b).shape)
print(np.array(c).shape, pd.DataFrame(c).shape)
print(np.array(d).shape, pd.DataFrame(d).shape)
(3,) (3, 1)       #3 rows, 1 column
(2, 3) (2, 3)     #2 rows, 3 columns
(1, 3) (1, 3)     #1 row, 3 columns
(2, 1, 3) (2, 1)  #2 rows, 1 column (and each cell will hold a 3 length list)!

Here, the numpy array shape will match the (rows, columns) expected in a pandas dataframe. In the last case, only the first two axis are considered as rows and columns respectively. The remaining objects after stripping those axis are directly stored in the dataframe.

Upvotes: 2

Related Questions