Reputation: 63
I just want to create a dataFrame that is updated with itself(df3), adding rows from other dataFrames (df1,df2) based on an index ("ID").
When adding a new dataFrame if an overlap index is found, update the data. If it is not found, add the data including the new index.
df1 = pd.DataFrame({"Proj. Num" :["A"],'ID':[000],'DATA':["NO_DATA"]})
df1 = df1.set_index(["ID"])
df2 = pd.DataFrame({"Proj. Num" :["B"],'ID':[100],'DATA':["OK"], })
df2 = df2.set_index(["ID"])
df3 = pd.DataFrame({"Proj. Num" :["B"],'ID':[100],'DATA':["NO_OK"], })
df3 = df3.set_index(["ID"])
#df3 = pd.concat([df1,df2, df3]) #Concat,merge,join???
df3
I have tried concatenate with _verify_integrity=False_ but it just gives an error, and I think there is a more simple/nicer way to do it.
Upvotes: 2
Views: 197
Reputation: 862511
Solution with concat
+ Index.duplicated
for boolean mask and filter by boolean indexing
:
df3 = pd.concat([df1, df2, df3])
df3 = df3[~df3.index.duplicated()]
print (df3)
DATA Proj. Num
ID
0 NO_DATA A
100 OK B
Another solution by comment, thank you:
df3 = pd.concat([df3,df1])
df3 = df3[~df3.index.duplicated(keep='last')]
print (df3)
DATA Proj. Num
ID
100 NO_OK B
0 NO_DATA A
Upvotes: 3
Reputation: 19947
#update existing rows
df3.update(df1)
#append new rows
df3 = pd.concat([df3,df1[~df1.index.isin(df3.index)]])
#update existing rows
df3.update(df2)
#append new rows
df3 = pd.concat([df3,df2[~df2.index.isin(df3.index)]])
Out[2438]:
DATA Proj. Num
ID
100 OK B
0 NO_DATA A
Upvotes: 0
Reputation: 7466
You can concatenate all the dataframes along the index; group by index and decide which element to keep of the group sharing the same index.
From your question it looks like you want to keep the last (most updated) element with the same index. It is then important the order in which you pass the dataframes in the pd.concat
function.
For a list of other methods, see here.
res = pd.concat([df1, df2, df3], axis = 0)
res.groupby(res.index).last()
Which gives:
DATA Proj. Num
ID
0 NO_DATA A
100 NO_OK B
Upvotes: 1