Jun Seong Jang
Jun Seong Jang

Reputation: 395

Pandas dataframe to 1-d array

I have a dataframe with lots of columns.

I first choose only one column from the dataframe by r_i = df.iloc[:, i: i + 1]

Then I want to turn this r_i into array simply by np.array(r_i).

the result I want is like: array([-1, -2, -3]). In other words, it should be array of one list.

However, it gives me array of one list which consists of sublists: array([[-1], [-2], [-3]]).

How do I prevent this from happening?

Thank you.

Upvotes: 3

Views: 11389

Answers (2)

Santosh Katuwal
Santosh Katuwal

Reputation: 101

df.values.flatten()

Here, df is your DataFrame.

Upvotes: 6

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95948

So, given:

>>> df = pd.DataFrame({'a':[1,2,3,4], 'b':[5,6,7,8], 'c':[9,10,11,12]})
>>> i = 1
>>> df
   a  b   c
0  1  5   9
1  2  6  10
2  3  7  11
3  4  8  12
>>> df.iloc[:, i: i + 1]
   b
0  5
1  6
2  7
3  8
>>> np.array(df.iloc[:, i: i + 1])
array([[5],
       [6],
       [7],
       [8]])

You could use the .squeeze method, which removes a single dimension from your array:

>>> np.array(df.iloc[:, i: i + 1]).squeeze()
array([5, 6, 7, 8])

Although I'd probably just use:

>>> df.iloc[:, i: i + 1].values.squeeze()
array([5, 6, 7, 8])

Or alternatively, you could always use .reshape, which should be your first instinct when you want to reshape an array:

>>> np.array(df.iloc[:, i: i + 1]).reshape(-1)
array([5, 6, 7, 8])

Note, these will behave differently if you accidentally take an extra column, so:

>>> np.array(df.iloc[:, i: i + 2])
array([[ 5,  9],
       [ 6, 10],
       [ 7, 11],
       [ 8, 12]])

With reshape:

>>> np.array(df.iloc[:, i: i + 2]).reshape(-1)
array([ 5,  9,  6, 10,  7, 11,  8, 12])

With squeeze:

>>> np.array(df.iloc[:, i: i + 2]).squeeze()
array([[ 5,  9],
       [ 6, 10],
       [ 7, 11],
       [ 8, 12]])

Ideally, you'd probably just want that to fail, so if you want to program defensively, use reshape with explicit parameters instead of -1:

>>> np.array(df.iloc[:, i: i + 1]).reshape((df.shape[0],))
array([5, 6, 7, 8])
>>> np.array(df.iloc[:, i: i + 2]).reshape((df.shape[0],))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot reshape array of size 8 into shape (4,)
>>>

However

You could avoid this by not doing an unecessary slice, so:

>>> df.iloc[:, i: i + 1]
   b
0  5
1  6
2  7
3  8
>>> df.iloc[:, i + 1]
0     9
1    10
2    11
3    12
Name: c, dtype: int64

The latter gives you a series, which is already one-dimensional, so you could just use:

>>> df.iloc[:, i + 1].values
array([ 9, 10, 11, 12])

Upvotes: 3

Related Questions