Reputation: 201
I was able to create dataframe and force one data type by
import pandas as pd
test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=int)
But I want to specify type for each column. How can I do this? I tried the following which doesn't work as the resulting dtypes are objects and b columns are not casted into integers.
test = pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}, dtype=[('a', int),('b', int)])
Jeff helped with above case. But I found another problem when I try to create an empty dataframe and I want to be able to specify column types. For single type across columns, I could do
test = pd.DataFrame(columns=['a','b'], dtype=int)
What if I want to specify type for each of 'a' and 'b'?
Upvotes: 17
Views: 10832
Reputation: 578
Yes, good question. You can try to specify one common dtype at the time you create the dataframe or add empty numpy arrays with different dtypes. Nevertheless, my experience is that pandas tends to infer the dtype for the whole dataframe based on the data you add. I feel it is better to specify the dtypes for the various columns after you have added your data to the dataframe:
convert_dict = {'a': int, 'b': float}
df = df.astype(convert_dict)
Upvotes: 4
Reputation: 642
You can pass in a dictionary of numpy
arrays, with specified dtype
s; this works for creating both filled and empty arrays. (This answer is a slight adaptation on my answer here.)
Here's an empty array:
df = pd.DataFrame(data={'a' : np.array([], dtype=int),
'b' : np.array([], dtype=float)
}
)
Here's a filled_array:
df = pd.DataFrame(data={'a' : np.array([1,2,3], dtype=int),
'b' : np.array([4,5,6], dtype=float)
}
)
And you can use basically any type for dtype
, such as object
, str
, datetime.datetime
or CrazyClassYouDefined
. That said, if pandas doesn't specifically support a type (such as str
), pandas will fall back to treating that column as object
. Don't worry though, everything should still work.
Upvotes: 3
Reputation: 129038
You can pass in a Series which has a dtype parameter
In [15]: pd.DataFrame({'a':[1,2,3], 'b':[1.1,2.1,3.1]}).dtypes
Out[15]:
a int64
b float64
dtype: object
In [16]: pd.DataFrame({'a':Series([1,2,3],dtype='int32'), 'b':Series([1.1,2.1,3.1],dtype='float32')}).dtypes
Out[16]:
a int32
b float32
dtype: object
Upvotes: 7