HungryBird
HungryBird

Reputation: 1147

cannot add multiple column with values in Python Pandas

I want to add the the data of reference to data, so I use

data[reference.columns]=reference

but it only creates the column with no value, how can I add the value?

enter image description here

enter image description here

enter image description here

Upvotes: 2

Views: 679

Answers (1)

ALollz
ALollz

Reputation: 59519

Your two DataFrames are indexed differently, so when you do data[reference.columns] = reference it tries to align the new columns on indices. Since the indices of reference are not in data (or only align for index=0) it adds the columns, but fills the values with NaN.

It looks like you want to add multiple static columns to data with the values from reference. You can just assign these:

for col in reference.columns:
    data[col] = reference[col].values[0]

Here's an illustration of the issue.

import pandas as pd
data = pd.DataFrame({'id': [1, 2, 3, 4],
                   'val1': ['A', 'B', 'C', 'D']})
reference = pd.DataFrame({'id2': [1, 2, 3, 4],
                   'val2': ['A', 'B', 'C', 'D']})

These have the same indices ranging from 0-3.

data[reference.columns] = reference

Outputs

   id val1  id2 val2
0   1    A    1    A
1   2    B    2    B
2   3    C    3    C
3   4    D    4    D

But, if these DataFrames have different indices (that only partially overlap):

data = pd.DataFrame({'id': [1, 2, 3, 4],
                   'val1': ['A', 'B', 'C', 'D']})
reference = pd.DataFrame({'id2': [1, 2, 3, 4],
                   'val2': ['A', 'B', 'C', 'D']})
reference.index=[3,4,5,6]

data[reference.columns]=reference

Outputs:

   id val1  id2 val2
0   1    A  NaN  NaN
1   2    B  NaN  NaN
2   3    C  NaN  NaN
3   4    D  1.0    A

As only the index value of 3 is shared.

Upvotes: 2

Related Questions