Ayush Goyal
Ayush Goyal

Reputation: 233

What does the output of this line in pandas dataframe signify?

I am learning Pandas DataFrame and came across this code:

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))

Now when I use print(list(df.columns.values)) as suggested on this page, the output is:

[0, 1, 2]

I am unable to understand the output. What are the values 0,1,2 signifying. Since the height of DataFrame is 2, I suppose the last value 2 is signifying the height. What about 0 and 1?

I apologize if this question is a duplicate. I couldn't find any relevant explanation. If there is any similar question, please mention the link.

Many thanks.

Upvotes: 2

Views: 97

Answers (2)

piRSquared
piRSquared

Reputation: 294308

What is a data frame?

df is a data frame. Take a step back and take in what that means. I mean outside what it means from a Pandas perspective. Though there are many nuances to what different people mean by a data frame, generally, it is a table of data with rows and columns.

How do we reference those rows and/or columns?

Consider the example data frame df. I create a 4x4 table with tuples in each cell representing the (row, column) position of that cell. You'll also notice the labels on the rows are ['A', 'B', 'C', 'D'] and the labels on the columns are ['W', 'X', 'Y', 'Z']

df = pd.DataFrame(
    [[(i, j) for j in range(4)] for i in range(4)],
    list('ABCD'), list('WXYZ')
)

df

        W       X       Y       Z
A  (0, 0)  (0, 1)  (0, 2)  (0, 3)
B  (1, 0)  (1, 1)  (1, 2)  (1, 3)
C  (2, 0)  (2, 1)  (2, 2)  (2, 3)
D  (3, 0)  (3, 1)  (3, 2)  (3, 3)

If we wanted to reference by position, the zeroth row and third column is highlighted here.

df.style.applymap(lambda x: 'background: #aaf' if x == (0, 3) else '')

enter image description here

We could get at that position with iloc (which handles ordinal/positional indexing)

df.iloc[0, 3]

(0, 3)

What makes Pandas special is that it gives us an alternative way to reference both the rows and/or the columns. We could reference by the labels using loc (which handles label indexing)

df.loc['A', 'Z']

(0, 3)

I intentionally labeled the rows and columns with letters so as to not confuse label indexing with positional indexing. In your data frame, you let Pandas give you a default index for both rows and columns and those labels end up just being equivalent to positions when you begin.

What is the difference between label and positional indexing?

Consider this modified version of our data frame. Let's call it df_

df_ = df.sort_index(axis=1, ascending=False)

df_

        Z       Y       X       W
A  (0, 3)  (0, 2)  (0, 1)  (0, 0)
B  (1, 3)  (1, 2)  (1, 1)  (1, 0)
C  (2, 3)  (2, 2)  (2, 1)  (2, 0)
D  (3, 3)  (3, 2)  (3, 1)  (3, 0)

Notice that the columns are in reverse order. And when I call the same positional reference as above but on df_

df_.iloc[0, 3]

(0, 0)

I get a different answer because my columns have shifted around and are out of their original position.

However, if I call the same label reference

df_.loc['A', 'Z']

(0, 3)

I get the same thing. So label indexing allows me to reference regardless of the order of rows or columns.

OK! But what about OP's question?

Pandas stores the data in an attribute values

df.values

array([[(0, 0), (0, 1), (0, 2), (0, 3)],
       [(1, 0), (1, 1), (1, 2), (1, 3)],
       [(2, 0), (2, 1), (2, 2), (2, 3)],
       [(3, 0), (3, 1), (3, 2), (3, 3)]], dtype=object)

The columns labels in an attribute columns

df.columns

Index(['W', 'X', 'Y', 'Z'], dtype='object')

And the row labels in an attribute index

df.index

Index(['A', 'B', 'C', 'D'], dtype='object')

It so happens that in OP's sample data frame, the columns were [0, 1, 2]

Upvotes: 3

jezrael
jezrael

Reputation: 862731

If question is what are columns check samples:

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
print (df)
   0  1  2
0  1  2  3
1  4  5  6

#default columns names
print(list(df.columns.values))
[0, 1, 2]

print(list(df.index.values))
[0, 1]

df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]), columns=list('abc'))
print (df)
   a  b  c
0  1  2  3
1  4  5  6

#custom columns names
print(list(df.columns.values))
['a', 'b', 'c']

print(list(df.index.values))
[0, 1]

You can also check docs:

The axis labeling information in pandas objects serves many purposes:

Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display
Enables automatic and explicit data alignment
Allows intuitive getting and setting of subsets of the data set

Upvotes: 3

Related Questions