fizcris
fizcris

Reputation: 63

Pandas: Add row depending on index

I just want to create a dataFrame that is updated with itself(df3), adding rows from other dataFrames (df1,df2) based on an index ("ID").

When adding a new dataFrame if an overlap index is found, update the data. If it is not found, add the data including the new index.

df1 = pd.DataFrame({"Proj. Num" :["A"],'ID':[000],'DATA':["NO_DATA"]})
df1 = df1.set_index(["ID"])

df2 = pd.DataFrame({"Proj. Num" :["B"],'ID':[100],'DATA':["OK"], }) 
df2 = df2.set_index(["ID"])

df3 = pd.DataFrame({"Proj. Num" :["B"],'ID':[100],'DATA':["NO_OK"], }) 
df3 = df3.set_index(["ID"])

#df3 = pd.concat([df1,df2, df3]) #Concat,merge,join???
df3

I have tried concatenate with _verify_integrity=False_ but it just gives an error, and I think there is a more simple/nicer way to do it.

Upvotes: 2

Views: 197

Answers (3)

jezrael
jezrael

Reputation: 862511

Solution with concat + Index.duplicated for boolean mask and filter by boolean indexing:

df3 = pd.concat([df1, df2, df3])
df3 = df3[~df3.index.duplicated()]
print (df3)
        DATA Proj. Num
ID                    
0    NO_DATA         A
100       OK         B

Another solution by comment, thank you:

df3 = pd.concat([df3,df1]) 
df3 = df3[~df3.index.duplicated(keep='last')]
print (df3)
        DATA Proj. Num
ID                    
100    NO_OK         B
0    NO_DATA         A

Upvotes: 3

Allen Qin
Allen Qin

Reputation: 19947

#update existing rows 
df3.update(df1)
#append new rows
df3 = pd.concat([df3,df1[~df1.index.isin(df3.index)]])

#update existing rows 
df3.update(df2)
#append new rows
df3 = pd.concat([df3,df2[~df2.index.isin(df3.index)]])

Out[2438]: 
        DATA Proj. Num
ID                    
100       OK         B
0    NO_DATA         A

Upvotes: 0

FLab
FLab

Reputation: 7466

You can concatenate all the dataframes along the index; group by index and decide which element to keep of the group sharing the same index.

From your question it looks like you want to keep the last (most updated) element with the same index. It is then important the order in which you pass the dataframes in the pd.concat function. For a list of other methods, see here.

res = pd.concat([df1, df2, df3], axis = 0)
res.groupby(res.index).last()

Which gives:

        DATA Proj. Num
ID                    
0    NO_DATA         A
100    NO_OK         B

Upvotes: 1

Related Questions