Reputation: 35
I am quite new to pandas and I am trying to add a column to a dataframe, considering that the new column has its own index.
For example, let's consider the following data:
kp = np.array([0.0, 1.0, 2.0, 3.0, 4.0])
val = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
df = pd.DataFrame({"test":val}, index=kp)
Now, I would like to add a new column to this dataframe considering that the new index may be different of the one used in df:
kp2 = np.array([0.5, 1.5, 2.5, 3.5, 4.0])
val2 = np.array([0.6, 0.7, 0.8, 0.9, 0.10])
What I want: Resulting dataframe
kp2 and val2 have the same length, kp and val have the same length but kp and kp2 may have different length and of course, different index. I used index for kp as I wanted the index to be unique and so to merge index when adding a new column. If there is a better solution, feel free to propose. Thanks for your help.
Upvotes: 0
Views: 52
Reputation: 140
You are looking for pandas merge method.
create a new dataframe like the one you created earlier.
df2 = pd.DataFrame({"test2":val2}, index=kp2)
merge them using dataframe merge command:
ddf = df.merge(df2,how='outer',left_index=True,right_index=True,sort=True)
ddf
how='outer' will help you merge 2 dataframes, in a fashion similar to FULL OUTER JOIN of SQL. For other options/arguments look at the docs.pandas merge docs
Upvotes: 1
Reputation: 24324
import pandas as pd
import numpy as np
#your data:
kp = np.array([0.0, 1.0, 2.0, 3.0, 4.0])
val = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
kp2 = np.array([0.5, 1.5, 2.5, 3.5, 4.0])
val2 = np.array([0.6, 0.7, 0.8, 0.9, 0.10])
df = pd.DataFrame({"test":val}, index=kp)
df2 = pd.DataFrame({"test2":val2}, index=kp2)
You can do this simply by using concat()
method:
result=pd.concat((df,df2),axis=1)
Finally use sort_index()
method:
result=result.sort_index()
You can do this in 1 line by:
result=pd.concat((df,df2),axis=1).sort_index()
#Output of result:
test test2
0.0 0.1 NaN
0.5 NaN 0.6
1.0 0.2 NaN
1.5 NaN 0.7
2.0 0.3 NaN
2.5 NaN 0.8
3.0 0.4 NaN
3.5 NaN 0.9
4.0 0.5 0.1
Upvotes: 2