Reputation: 233
I am learning Pandas DataFrame and came across this code:
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
Now when I use print(list(df.columns.values))
as suggested on this page, the output is:
[0, 1, 2]
I am unable to understand the output. What are the values 0,1,2 signifying. Since the height of DataFrame is 2, I suppose the last value 2 is signifying the height. What about 0 and 1?
I apologize if this question is a duplicate. I couldn't find any relevant explanation. If there is any similar question, please mention the link.
Many thanks.
Upvotes: 2
Views: 97
Reputation: 294308
df
is a data frame. Take a step back and take in what that means. I mean outside what it means from a Pandas perspective. Though there are many nuances to what different people mean by a data frame, generally, it is a table of data with rows and columns.
Consider the example data frame df
. I create a 4x4 table with tuples in each cell representing the (row, column)
position of that cell. You'll also notice the labels on the rows are ['A', 'B', 'C', 'D']
and the labels on the columns are ['W', 'X', 'Y', 'Z']
df = pd.DataFrame(
[[(i, j) for j in range(4)] for i in range(4)],
list('ABCD'), list('WXYZ')
)
df
W X Y Z
A (0, 0) (0, 1) (0, 2) (0, 3)
B (1, 0) (1, 1) (1, 2) (1, 3)
C (2, 0) (2, 1) (2, 2) (2, 3)
D (3, 0) (3, 1) (3, 2) (3, 3)
If we wanted to reference by position, the zeroth row and third column is highlighted here.
df.style.applymap(lambda x: 'background: #aaf' if x == (0, 3) else '')
We could get at that position with iloc
(which handles ordinal/positional indexing)
df.iloc[0, 3]
(0, 3)
What makes Pandas special is that it gives us an alternative way to reference both the rows and/or the columns. We could reference by the labels using loc
(which handles label indexing)
df.loc['A', 'Z']
(0, 3)
I intentionally labeled the rows and columns with letters so as to not confuse label indexing with positional indexing. In your data frame, you let Pandas give you a default index for both rows and columns and those labels end up just being equivalent to positions when you begin.
Consider this modified version of our data frame. Let's call it df_
df_ = df.sort_index(axis=1, ascending=False)
df_
Z Y X W
A (0, 3) (0, 2) (0, 1) (0, 0)
B (1, 3) (1, 2) (1, 1) (1, 0)
C (2, 3) (2, 2) (2, 1) (2, 0)
D (3, 3) (3, 2) (3, 1) (3, 0)
Notice that the columns are in reverse order. And when I call the same positional reference as above but on df_
df_.iloc[0, 3]
(0, 0)
I get a different answer because my columns have shifted around and are out of their original position.
However, if I call the same label reference
df_.loc['A', 'Z']
(0, 3)
I get the same thing. So label indexing allows me to reference regardless of the order of rows or columns.
Pandas stores the data in an attribute values
df.values
array([[(0, 0), (0, 1), (0, 2), (0, 3)],
[(1, 0), (1, 1), (1, 2), (1, 3)],
[(2, 0), (2, 1), (2, 2), (2, 3)],
[(3, 0), (3, 1), (3, 2), (3, 3)]], dtype=object)
The columns labels in an attribute columns
df.columns
Index(['W', 'X', 'Y', 'Z'], dtype='object')
And the row labels in an attribute index
df.index
Index(['A', 'B', 'C', 'D'], dtype='object')
It so happens that in OP's sample data frame, the columns were [0, 1, 2]
Upvotes: 3
Reputation: 862731
If question is what are columns check samples:
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
print (df)
0 1 2
0 1 2 3
1 4 5 6
#default columns names
print(list(df.columns.values))
[0, 1, 2]
print(list(df.index.values))
[0, 1]
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]), columns=list('abc'))
print (df)
a b c
0 1 2 3
1 4 5 6
#custom columns names
print(list(df.columns.values))
['a', 'b', 'c']
print(list(df.index.values))
[0, 1]
You can also check docs:
The axis labeling information in pandas objects serves many purposes:
Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display
Enables automatic and explicit data alignment
Allows intuitive getting and setting of subsets of the data set
Upvotes: 3