Reputation: 332
I want to select all data inside a Data Frame (except index, column indices and right-most column - see the image below) and store it into a Series. This might be obvious, but I cannot get anything working. I have tried for example a = nai_data.ix[0:19]
but it returns a new Data Frame again with all indices and I need just a Series of data. So I tried a = pd.Series(nai_data.ix[0:19])
but didnt help either. I am sure there must be a simple way to do this but cant find out. Any help appreciated
Upvotes: 0
Views: 304
Reputation: 880199
Perhaps you are looking for stack()
, which can be thought of as moving the column index into the row index:
In [12]: np.random.seed(2015)
In [13]: df = pd.DataFrame(np.random.randint(10, size=(3,4)))
In [14]: df
Out[14]:
0 1 2 3
0 2 2 9 6
1 8 5 7 8
2 0 6 7 8
In [15]: df.stack()
Out[15]:
0 0 2
1 2
2 9
3 6
1 0 8
1 5
2 7
3 8
2 0 0
1 6
2 7
3 8
dtype: int64
If you don't want the MultiIndex, call reset_index()
:
In [16]: df.stack().reset_index(drop=True)
Out[16]:
0 2
1 2
2 9
3 6
4 8
5 5
6 7
7 8
8 0
9 6
10 7
11 8
dtype: int64
To select all but the last column, you could use df.iloc
:
In [17]: df.iloc[:, :-1]
Out[17]:
0 1 2
0 2 2 9
1 8 5 7
2 0 6 7
In [18]: df.iloc[:, :-1].stack()
Out[18]:
0 0 2
1 2
2 9
1 0 8
1 5
2 7
2 0 0
1 6
2 7
dtype: int64
Another way would be to slice and flatten the underlying NumPy array:
In [21]: df.values
Out[21]:
array([[2, 2, 9, 6],
[8, 5, 7, 8],
[0, 6, 7, 8]])
In [22]: df.values[:, :-1]
Out[22]:
array([[2, 2, 9],
[8, 5, 7],
[0, 6, 7]])
In [23]: df.values[:, :-1].ravel()
Out[23]: array([2, 2, 9, 8, 5, 7, 0, 6, 7])
and then just build the Series using this data:
In [24]: pd.Series(df.values[:, :-1].ravel())
Out[24]:
0 2
1 2
2 9
3 8
4 5
5 7
6 0
7 6
8 7
dtype: int64
Upvotes: 1