Reputation: 1328
What's the difference between:
Maand['P_Sanyo_Gesloten']
Out[119]:
Time
2012-08-01 00:00:11 0
2012-08-01 00:05:10 0
2012-08-01 00:10:11 0
2012-08-01 00:20:10 0
2012-08-01 00:25:10 0
2012-08-01 00:30:09 0
2012-08-01 00:40:10 0
2012-08-01 00:50:09 0
2012-08-01 01:05:10 0
2012-08-01 01:10:10 0
2012-08-01 01:15:10 0
2012-08-01 01:25:10 0
2012-08-01 01:30:10 0
2012-08-01 01:35:09 0
2012-08-01 01:40:10 0
...
2012-08-30 22:35:09 0
2012-08-30 22:45:10 0
2012-08-30 22:50:09 0
2012-08-30 22:55:10 0
2012-08-30 23:00:09 0
2012-08-30 23:05:10 0
2012-08-30 23:10:09 0
2012-08-30 23:15:10 0
2012-08-30 23:20:09 0
2012-08-30 23:25:10 0
2012-08-30 23:35:09 0
2012-08-30 23:40:10 0
2012-08-30 23:45:09 0
2012-08-30 23:50:10 0
2012-08-30 23:55:11 0
Name: P_Sanyo_Gesloten, Length: 7413, dtype: int64
And
Maand[[1]]
Out[120]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7413 entries, 2012-08-01 00:00:11 to 2012-08-30 23:55:11
Data columns (total 1 columns):
P_Sanyo_Gesloten 7413 non-null values
dtypes: int64(1)
How can I get column by its index number? And not by an index string?
Upvotes: 52
Views: 216917
Reputation: 63453
To formalize the comment by Jeff into an answer, this is the simplest I know that works. It is simpler than the prior answer by Andy H which uses a list.
my_col = df.iloc[:, column_index]
Use 0 for the first column, 1 for the second column, and so on. For example:
first_col = df.iloc[:, 0]
As a bonus, if the goal is to iterate over the columns, df.items()
is sufficient.
Upvotes: 3
Reputation: 2310
You can also use take
to get any column(s) by position:
In [2]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
In [3]: df
Out[3]:
a b
0 1 2
1 3 4
In [4]: df.take([1], axis=1)
Out[4]:
b
0 2
1 4
Upvotes: 1
Reputation: 4253
another way to access a column by number is to use a mapping dictionary where the key is the column name and the value is the column number
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4),
index=dates, columns=['A', 'B', 'C', 'D'])
print(df)
dct={'A':0,'B':1,'C':2,'D':3}
columns=df.columns
print(df.iloc[:,dct['D']])
Upvotes: 1
Reputation: 12049
Another way is to select a column with the columns
array:
In [5]: df = pd.DataFrame([[1,2], [3,4]], columns=['a', 'b'])
In [6]: df
Out[6]:
a b
0 1 2
1 3 4
In [7]: df[df.columns[0]]
Out[7]:
0 1
1 3
Name: a, dtype: int64
Upvotes: 23
Reputation: 375685
One is a column (aka Series), while the other is a DataFrame:
In [1]: df = pd.DataFrame([[1,2], [3,4]], columns=['a', 'b'])
In [2]: df
Out[2]:
a b
0 1 2
1 3 4
The column 'b' (aka Series):
In [3]: df['b']
Out[3]:
0 2
1 4
Name: b, dtype: int64
The subdataframe with columns (position) in [1]:
In [4]: df[[1]]
Out[4]:
b
0 2
1 4
Note: it's preferable (and less ambiguous) to specify whether you're talking about the column name e.g. ['b'] or the integer location, since sometimes you can have columns named as integers:
In [5]: df.iloc[:, [1]]
Out[5]:
b
0 2
1 4
In [6]: df.loc[:, ['b']]
Out[6]:
b
0 2
1 4
In [7]: df.loc[:, 'b']
Out[7]:
0 2
1 4
Name: b, dtype: int64
Upvotes: 59
Reputation: 4989
The following is taken from http://pandas.pydata.org/pandas-docs/dev/indexing.html. There are a few more examples... you have to scroll down a little
In [816]: df1
0 2 4 6
0 0.569605 0.875906 -2.211372 0.974466
2 -2.006747 -0.410001 -0.078638 0.545952
4 -1.219217 -1.226825 0.769804 -1.281247
6 -0.727707 -0.121306 -0.097883 0.695775
8 0.341734 0.959726 -1.110336 -0.619976
10 0.149748 -0.732339 0.687738 0.176444
Select via integer slicing
In [817]: df1.iloc[:3]
0 2 4 6
0 0.569605 0.875906 -2.211372 0.974466
2 -2.006747 -0.410001 -0.078638 0.545952
4 -1.219217 -1.226825 0.769804 -1.281247
In [818]: df1.iloc[1:5,2:4]
4 6
2 -0.078638 0.545952
4 0.769804 -1.281247
6 -0.097883 0.695775
8 -1.110336 -0.619976
Select via integer list
In [819]: df1.iloc[[1,3,5],[1,3]]
2 6
2 -0.410001 0.545952
6 -0.121306 0.695775
10 -0.732339 0.176444
Upvotes: 18