user3461238
user3461238

Reputation: 201

python pandas create dataframe and force multiple column types

I was able to create dataframe and force one data type by

import pandas as pd
test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=int)

But I want to specify type for each column. How can I do this? I tried the following which doesn't work as the resulting dtypes are objects and b columns are not casted into integers.

test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=[('a', int),('b', int)])

Jeff helped with above case. But I found another problem when I try to create an empty dataframe and I want to be able to specify column types. For single type across columns, I could do

test = pd.DataFrame(columns=['a','b'], dtype=int)

What if I want to specify type for each of 'a' and 'b'?

Upvotes: 17

Views: 10832

Answers (3)

Dobedani
Dobedani

Reputation: 578

Yes, good question. You can try to specify one common dtype at the time you create the dataframe or add empty numpy arrays with different dtypes. Nevertheless, my experience is that pandas tends to infer the dtype for the whole dataframe based on the data you add. I feel it is better to specify the dtypes for the various columns after you have added your data to the dataframe:

convert_dict = {'a': int, 'b': float}
df = df.astype(convert_dict)

Upvotes: 4

Eric G.
Eric G.

Reputation: 642

You can pass in a dictionary of numpy arrays, with specified dtypes; this works for creating both filled and empty arrays. (This answer is a slight adaptation on my answer here.)

Here's an empty array:

df = pd.DataFrame(data={'a' : np.array([], dtype=int),
                        'b' : np.array([], dtype=float)
                       }
                 )

Here's a filled_array:

df = pd.DataFrame(data={'a' : np.array([1,2,3], dtype=int),
                        'b' : np.array([4,5,6], dtype=float)
                       }
                 )

And you can use basically any type for dtype, such as object, str, datetime.datetime or CrazyClassYouDefined. That said, if pandas doesn't specifically support a type (such as str), pandas will fall back to treating that column as object. Don't worry though, everything should still work.

Upvotes: 3

Jeff
Jeff

Reputation: 129038

You can pass in a Series which has a dtype parameter

In [15]: pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}).dtypes
Out[15]: 
a      int64
b    float64
dtype: object

In [16]: pd.DataFrame({'a':Series([1,2,3],dtype='int32'), 'b':Series([1.1,2.1,3.1],dtype='float32')}).dtypes
Out[16]: 
a      int32
b    float32
dtype: object

Upvotes: 7

Related Questions