Reputation: 1246
I am encountering pretty strange behavior. If I let
dict = {'newcol':[1,5], 'othercol':[12,-10]}
df = pandas.DataFrame(data=dict)
print df['newcol']
I get back a pandas Series object with 1 and 5 in it. Great.
print df
I get back the DataFrame as I would expect. Cool.
But what if I want to add to a DataFrame a little at a time? (My use case is saving metrics for machine learner training runs happening in parallel, where each process gets a number and then adds to only that row of the DataFrame.)
I can do the following:
df = pandas.DataFrame()
df['newcol'] = pandas.Series()
df['othercol'] = pandas.Series()
df['newcol'].loc[0] = 1
df['newcol'].loc[1] = 5
df['othercol'].loc[0] = 12
df['othercol'].loc[1] = -10
print df['newcol']
I get back the pandas Series I would expect, identical to creating the DataFrame by the first method.
print df
I see printed that df is an Empty DataFrame with columns [newcol, othercol].
Clearly in the second method the DataFrame's contents are equivalent to the first method. So why is it not smart enough to know it is filled? Is there a function I can call to update the DataFrame's knowledge of its own Series so all these (possibly out-of-order) Series can be unified in to a consistent DataFrame?
Upvotes: 0
Views: 617
Reputation: 38415
You would be able to assign data to an empty dataframe using following
df = pd.DataFrame()
df['newcol'] = pd.Series()
df['othercol'] = pd.Series()
df.loc[0, 'newcol'] = 1
df.loc[1, 'newcol'] = 5
df.loc[0, 'othercol'] = 12
df.loc[1, 'othercol'] = -10
newcol othercol
0 1.0 12.0
1 5.0 -10.0
Upvotes: 2