Pavel Komarov
Pavel Komarov

Reputation: 1246

Adding entries sequentially to an empty pandas DataFrame

I am encountering pretty strange behavior. If I let

dict = {'newcol':[1,5], 'othercol':[12,-10]}
df = pandas.DataFrame(data=dict)
print df['newcol']

I get back a pandas Series object with 1 and 5 in it. Great.

print df

I get back the DataFrame as I would expect. Cool.

But what if I want to add to a DataFrame a little at a time? (My use case is saving metrics for machine learner training runs happening in parallel, where each process gets a number and then adds to only that row of the DataFrame.)

I can do the following:

df = pandas.DataFrame()
df['newcol'] = pandas.Series()
df['othercol'] = pandas.Series()
df['newcol'].loc[0] = 1
df['newcol'].loc[1] = 5
df['othercol'].loc[0] = 12
df['othercol'].loc[1] = -10
print df['newcol']

I get back the pandas Series I would expect, identical to creating the DataFrame by the first method.

print df

I see printed that df is an Empty DataFrame with columns [newcol, othercol].

Clearly in the second method the DataFrame's contents are equivalent to the first method. So why is it not smart enough to know it is filled? Is there a function I can call to update the DataFrame's knowledge of its own Series so all these (possibly out-of-order) Series can be unified in to a consistent DataFrame?

Upvotes: 0

Views: 617

Answers (1)

Vaishali
Vaishali

Reputation: 38415

You would be able to assign data to an empty dataframe using following

df = pd.DataFrame()
df['newcol'] = pd.Series()
df['othercol'] = pd.Series()
df.loc[0, 'newcol'] = 1
df.loc[1, 'newcol'] = 5
df.loc[0, 'othercol'] = 12
df.loc[1, 'othercol'] = -10

    newcol  othercol
0   1.0     12.0
1   5.0     -10.0

Upvotes: 2

Related Questions