Reputation: 11633
This post provides an elegant way to create an empty pandas DataFrame of a specified data type. And if you specify np.nan values when you initialize it, the data type is set to float:
df_training_outputs = pd.DataFrame(np.nan, index=index, columns=column_names)
But I want to create an empty DataFrame with different data types in each column. It seems the dtype keyword argument will only accept one.
Background: I am writing a script that generates data incrementally and so I need somewhere to store it during the execution of the script. I thought an empty data frame (large enough to take all the expected data) would be the best way to do this. This must be a fairly common tasks so if someone has a better way please advise.
Upvotes: 2
Views: 4378
Reputation: 3481
One way you can create an empty dataframe with columns of different types is by providing an empty numpy array with a correct structured dtype:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.empty(0, dtype=[('a', 'u4'), ('b', 'S20'), ('c', 'f8')]))
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 0 entries
Data columns (total 3 columns):
a 0 non-null uint32
b 0 non-null object
c 0 non-null float64
dtypes: float64(1), object(1), uint32(1)
memory usage: 76.0+ bytes
Upvotes: 3