Bill
Bill

Reputation: 11633

How do I create an empty pandas DataFrame with different data types assigned to each column?

This post provides an elegant way to create an empty pandas DataFrame of a specified data type. And if you specify np.nan values when you initialize it, the data type is set to float:

df_training_outputs = pd.DataFrame(np.nan, index=index, columns=column_names)

But I want to create an empty DataFrame with different data types in each column. It seems the dtype keyword argument will only accept one.

Background: I am writing a script that generates data incrementally and so I need somewhere to store it during the execution of the script. I thought an empty data frame (large enough to take all the expected data) would be the best way to do this. This must be a fairly common tasks so if someone has a better way please advise.

Upvotes: 2

Views: 4378

Answers (1)

aldanor
aldanor

Reputation: 3481

One way you can create an empty dataframe with columns of different types is by providing an empty numpy array with a correct structured dtype:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.empty(0, dtype=[('a', 'u4'), ('b', 'S20'), ('c', 'f8')]))

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 0 entries
Data columns (total 3 columns):
a    0 non-null uint32
b    0 non-null object
c    0 non-null float64
dtypes: float64(1), object(1), uint32(1)
memory usage: 76.0+ bytes

Upvotes: 3

Related Questions