How do I create an empty pandas DataFrame with different data types assigned to each column?

Question

This post provides an elegant way to create an empty pandas DataFrame of a specified data type. And if you specify np.nan values when you initialize it, the data type is set to float:

df_training_outputs = pd.DataFrame(np.nan, index=index, columns=column_names)

But I want to create an empty DataFrame with different data types in each column. It seems the dtype keyword argument will only accept one.

Background: I am writing a script that generates data incrementally and so I need somewhere to store it during the execution of the script. I thought an empty data frame (large enough to take all the expected data) would be the best way to do this. This must be a fairly common tasks so if someone has a better way please advise.

aldanor · Accepted Answer

One way you can create an empty dataframe with columns of different types is by providing an empty numpy array with a correct structured dtype:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.empty(0, dtype=[('a', 'u4'), ('b', 'S20'), ('c', 'f8')]))

>>> df.info()

RangeIndex: 0 entries
Data columns (total 3 columns):
a    0 non-null uint32
b    0 non-null object
c    0 non-null float64
dtypes: float64(1), object(1), uint32(1)
memory usage: 76.0+ bytes

How do I create an empty pandas DataFrame with different data types assigned to each column?

Answers (1)

Related Questions