Reputation: 1777

Creating pandas dataframes from several numpy series

I'm trying to create a pandas data frame where the columns are numpy arrays. I also want to name the columns at creation.

This seems like a very simple task.

It works ok-ish without naming the columns, although the columns are in the wrong order:

import numpy as np
import pandas as pd

n_obs = 500

df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80)) 

print(df.head())

Output:

49  3.802458
57  3.830600
29  4.991442
47  2.600079
70  1.658041
52  2.236296
37  3.327520
23  1.366954
22  1.509165
36  1.289901
77  3.834789
68  4.370223
40  4.532152
71  2.348842

When I try to name the columns I get an error:

df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2'])

Output:

Traceback (most recent call last):
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4622, in create_block_manager_from_blocks
    placement=slice(0, len(axes[0])))]
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 2957, in make_block
    return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 120, in __init__
    len(self.mgr_locs)))
ValueError: Wrong number of items passed 1, placement implies 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "fake.py", line 33, in <module>
    df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) ,
 np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2'
])
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 361, in __init__
    copy=copy)
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 533, in _init_ndarray
    return create_block_manager_from_blocks([values], [columns, index])
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4631, in create_block_manager_from_blocks
    construction_error(tot_items, blocks[0].shape[1:], axes, e)
  File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4608, in construction_error
    passed, implied))
ValueError: Shape of passed values is (1, 500), indices imply (2, 500)

I can't find a tutorial that covers this. It's obviously a very simple problem, but I cannot find a solution.

Upvotes: 2

Answers (3)

jezrael

Reputation: 863791

Pass arrays to DataFrame constructor with dict:

n_obs = 500

a = np.random.uniform(low = 1.1, high = 5.0,size = (n_obs))
b = np.random.randint(size = (n_obs), low = 18, high = 80)

df = pd.DataFrame({'col1':a, 'col2':b}) 
print (df.head())
       col1  col2
0  2.070148    23
1  1.735960    28
2  4.156209    72
3  4.253241    26
4  3.539951    45

If use python bellow 3.6 is possible add parameter columns for specify ordering (from Python 3.6 onwards, the standard dict type maintains insertion order by default):

df = pd.DataFrame({'col1':a, 'col2':b}, columns=['col2','col1']) 
print (df.head())
   col2      col1
0    23  2.070148
1    28  1.735960
2    72  4.156209
3    26  4.253241
4    45  3.539951

You can also stack arrays in numpy, but get same types of data - here floats:

df = pd.DataFrame(np.column_stack((a,b)), columns=['col1','col2']) 
print (df.head())
       col1  col2
0  2.070148  23.0
1  1.735960  28.0
2  4.156209  72.0
3  4.253241  26.0
4  3.539951  45.0

Also in you solution:

df = pd.DataFrame(a, b)

First array create column and second index, it is like:

df = pd.DataFrame(a, index=b) 
print (df.head())
           0
23  2.070148
28  1.735960
72  4.156209
26  4.253241
45  3.539951

Upvotes: 4

jpp

Reputation: 164843

`pd.concat` + `pd.Series`

You can convert to series and concatenate:

np.random.seed(0)

n_obs = 500
a = np.random.uniform(low=1.1, high=5.0, size=n_obs)
b = np.random.randint(size=n_obs, low=18, high=80)

df = pd.concat(map(pd.Series, (a, b)), axis=1, keys=['a', 'b'])

print(df.head())

          a   b
0  3.240373  57
1  3.889239  60
2  3.450777  77
3  3.225044  46
4  2.752254  42

Upvotes: 2

sync11

Reputation: 1280

Take a look:

n_obs = 500
df = pd.DataFrame([np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , 
                  np.random.randint(size = (n_obs), low = 18, high = 80)]).T
df.columns = ['col1','col2']

Upvotes: 1

Creating pandas dataframes from several numpy series

Answers (3)

pd.concat + pd.Series

Related Questions

`pd.concat` + `pd.Series`