Reputation: 395
I have a dataframe with lots of columns.
I first choose only one column from the dataframe by
r_i = df.iloc[:, i: i + 1]
Then I want to turn this r_i
into array simply by
np.array(r_i)
.
the result I want is like:
array([-1, -2, -3])
. In other words, it should be array of one list.
However, it gives me array of one list which consists of sublists:
array([[-1], [-2], [-3]])
.
How do I prevent this from happening?
Thank you.
Upvotes: 3
Views: 11389
Reputation: 95948
So, given:
>>> df = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8], 'c':[9,10,11,12]})
>>> i = 1
>>> df
a b c
0 1 5 9
1 2 6 10
2 3 7 11
3 4 8 12
>>> df.iloc[:, i: i + 1]
b
0 5
1 6
2 7
3 8
>>> np.array(df.iloc[:, i: i + 1])
array([[5],
[6],
[7],
[8]])
You could use the .squeeze
method, which removes a single dimension from your array:
>>> np.array(df.iloc[:, i: i + 1]).squeeze()
array([5, 6, 7, 8])
Although I'd probably just use:
>>> df.iloc[:, i: i + 1].values.squeeze()
array([5, 6, 7, 8])
Or alternatively, you could always use .reshape
, which should be your first instinct when you want to reshape an array:
>>> np.array(df.iloc[:, i: i + 1]).reshape(-1)
array([5, 6, 7, 8])
Note, these will behave differently if you accidentally take an extra column, so:
>>> np.array(df.iloc[:, i: i + 2])
array([[ 5, 9],
[ 6, 10],
[ 7, 11],
[ 8, 12]])
With reshape:
>>> np.array(df.iloc[:, i: i + 2]).reshape(-1)
array([ 5, 9, 6, 10, 7, 11, 8, 12])
With squeeze:
>>> np.array(df.iloc[:, i: i + 2]).squeeze()
array([[ 5, 9],
[ 6, 10],
[ 7, 11],
[ 8, 12]])
Ideally, you'd probably just want that to fail, so if you want to program defensively, use reshape
with explicit parameters instead of -1
:
>>> np.array(df.iloc[:, i: i + 1]).reshape((df.shape[0],))
array([5, 6, 7, 8])
>>> np.array(df.iloc[:, i: i + 2]).reshape((df.shape[0],))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot reshape array of size 8 into shape (4,)
>>>
You could avoid this by not doing an unecessary slice, so:
>>> df.iloc[:, i: i + 1]
b
0 5
1 6
2 7
3 8
>>> df.iloc[:, i + 1]
0 9
1 10
2 11
3 12
Name: c, dtype: int64
The latter gives you a series, which is already one-dimensional, so you could just use:
>>> df.iloc[:, i + 1].values
array([ 9, 10, 11, 12])
Upvotes: 3