Mengezi Dhlomo
Mengezi Dhlomo

Reputation: 335

Getting NaN's instead of the correct values inside dataframe column

I created a dataframe of zeros using this syntax:

ltv = pd.DataFrame(data=np.zeros([actual_df.shape[0], 6]),
                        columns=['customer_id',
                                'actual_total',
                                'predicted_num_purchases',
                                'predicted_value',
                                'predicted_total',
                                'error'], dtype=np.float32)

It comes out perfectly as expected

customer_id | actual_total | predicted_num_purchases | predicted_value | predicted_total | error
0   0.0          0.0             0.0                         0.0              0.0           0.0
1   0.0          0.0             0.0                         0.0              0.0           0.0
2   0.0          0.0             0.0                         0.0              0.0           0.0

When I run this syntax:

ltv['customer_id'] = actual_df['customer_id']

I get all NaNs in ltv['customer_id']. What is causing this and how can I prevent it from happening?

NB: I also checked actual_dfand there are no NaNs inside of it

Upvotes: 1

Views: 51

Answers (2)

Celius Stingher
Celius Stingher

Reputation: 18367

Another option (more complicated than jezrael's great answer) is using pd.concat() followed by .drop():

ltv = pd.concat([ltv.drop(columns=['customer_id']),actual_df[['customer_id']]],axis=1,ignore_index=True)

Upvotes: 0

jezrael
jezrael

Reputation: 862406

You need same index values in both (and also same length of both DataFrames).

So first solution is create default RabgeIndex in actual_df, in ltv is not specify, so created by default:

actual_df = actual_df.reset_index(drop=True)
ltv['customer_id'] = actual_df['customer_id']

Or add parameter index to DataFrame constructor:

ltv = pd.DataFrame(data=np.zeros([actual_df.shape[0], 6]),
                        columns=['customer_id',
                                'actual_total',
                                'predicted_num_purchases',
                                'predicted_value',
                                'predicted_total',
                                'error'], dtype=np.float32,
                        index=actual_df.index)

ltv['customer_id'] = actual_df['customer_id']

Upvotes: 2

Related Questions