Avatar36
Avatar36

Reputation: 35

How to add a column to a dataframe by merging index?

I am quite new to pandas and I am trying to add a column to a dataframe, considering that the new column has its own index.

For example, let's consider the following data:

kp = np.array([0.0, 1.0, 2.0, 3.0, 4.0])
val = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
df = pd.DataFrame({"test":val}, index=kp)

Now, I would like to add a new column to this dataframe considering that the new index may be different of the one used in df:

kp2 = np.array([0.5, 1.5, 2.5, 3.5, 4.0])
val2 = np.array([0.6, 0.7, 0.8, 0.9, 0.10])

What I want: Resulting dataframe

enter image description here

kp2 and val2 have the same length, kp and val have the same length but kp and kp2 may have different length and of course, different index. I used index for kp as I wanted the index to be unique and so to merge index when adding a new column. If there is a better solution, feel free to propose. Thanks for your help.

Upvotes: 0

Views: 52

Answers (2)

G.MAHESH
G.MAHESH

Reputation: 140

You are looking for pandas merge method.

create a new dataframe like the one you created earlier.

df2 = pd.DataFrame({"test2":val2}, index=kp2)

merge them using dataframe merge command:

ddf = df.merge(df2,how='outer',left_index=True,right_index=True,sort=True)
ddf

how='outer' will help you merge 2 dataframes, in a fashion similar to FULL OUTER JOIN of SQL. For other options/arguments look at the docs.pandas merge docs

Upvotes: 1

Anurag Dabas
Anurag Dabas

Reputation: 24324

import pandas as pd
import numpy as np

#your data:
kp = np.array([0.0, 1.0, 2.0, 3.0, 4.0])
val = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
kp2 = np.array([0.5, 1.5, 2.5, 3.5, 4.0])
val2 = np.array([0.6, 0.7, 0.8, 0.9, 0.10])
df = pd.DataFrame({"test":val}, index=kp)
df2 = pd.DataFrame({"test2":val2}, index=kp2) 

You can do this simply by using concat() method:

result=pd.concat((df,df2),axis=1)

Finally use sort_index() method:

result=result.sort_index()

You can do this in 1 line by:

result=pd.concat((df,df2),axis=1).sort_index()

#Output of result:
    
    
       test     test2
0.0     0.1     NaN
0.5     NaN     0.6
1.0     0.2     NaN
1.5     NaN     0.7
2.0     0.3     NaN
2.5     NaN     0.8
3.0     0.4     NaN
3.5     NaN     0.9
4.0     0.5     0.1

Upvotes: 2

Related Questions