Reputation: 335
I created a dataframe of zeros using this syntax:
ltv = pd.DataFrame(data=np.zeros([actual_df.shape[0], 6]),
columns=['customer_id',
'actual_total',
'predicted_num_purchases',
'predicted_value',
'predicted_total',
'error'], dtype=np.float32)
It comes out perfectly as expected
customer_id | actual_total | predicted_num_purchases | predicted_value | predicted_total | error
0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0
When I run this syntax:
ltv['customer_id'] = actual_df['customer_id']
I get all NaNs in ltv['customer_id']
. What is causing this and how can I prevent it from happening?
NB: I also checked actual_df
and there are no NaNs inside of it
Upvotes: 1
Views: 51
Reputation: 18367
Another option (more complicated than jezrael's great answer) is using pd.concat()
followed by .drop()
:
ltv = pd.concat([ltv.drop(columns=['customer_id']),actual_df[['customer_id']]],axis=1,ignore_index=True)
Upvotes: 0
Reputation: 862406
You need same index values in both (and also same length of both DataFrames).
So first solution is create default RabgeIndex
in actual_df
, in ltv
is not specify, so created by default:
actual_df = actual_df.reset_index(drop=True)
ltv['customer_id'] = actual_df['customer_id']
Or add parameter index
to DataFrame
constructor:
ltv = pd.DataFrame(data=np.zeros([actual_df.shape[0], 6]),
columns=['customer_id',
'actual_total',
'predicted_num_purchases',
'predicted_value',
'predicted_total',
'error'], dtype=np.float32,
index=actual_df.index)
ltv['customer_id'] = actual_df['customer_id']
Upvotes: 2