Reputation: 223
I want compare two dataframes which is df1 and df2. df1 is a data that updates every hour by it self. df2 is a dataframe that alreay exists. I want to append specific row that is updated.
for example, Here is df1
df1:
which contains 5 rows of information
and df2 which already existed
df2:
we can tell that eric is added but df2 is not representing that.
I could overwrite df2 with df1 but I shouldn't as there will be remark that will be updated by person later the data is written.
So, I decided to remove each rows of data by finding it from df2 by its id, and remove them with for loop
and after that, there will be only eric's row will remain which will make me possible to just append eric to df2.
So what I tried is this
for index, row in df1.iterrows():
id = row['id']
if df2.loc[df1['id'].isin(id)] = True:
df1[df1.id != id)
and it returns syntax error....
am I on the right track? is it will be the best solution to solve this problem?and how should I change the code to achieve my goal?
Upvotes: 1
Views: 1991
Reputation: 294258
Let's assume that 'steve'
has a remark that we want to preserve in df1
and that 'jack'
has a remark we want to save in df2
. We can set the indices of each dataframe to ['id', 'name']
and use pd.Series.combine_first
Setup
df1 = pd.DataFrame(dict(
id=[12, 34, 56, 78, 90, 13],
name='james steve jack ted eric bob'.split(),
remark='',
))
df1.at[1, 'remark'] = 'meh'
df2 = pd.DataFrame(dict(
id=[12, 34, 56, 78, 13],
name='james steve jack ted bob'.split(),
remark='',
))
df2.at[2, 'remark'] = 'smart'
Solution
s1 = df1.set_index(['id', 'name']).remark
s2 = df2.set_index(['id', 'name']).remark
s1.mask(s1.eq('')).combine_first(s2.mask(s2.eq(''))).fillna('').reset_index()
id name remark
0 12 james
1 13 bob
2 34 steve meh
3 56 jack smart
4 78 ted
5 90 eric
However, supposing it was exactly as OP presented it!
Setup
df1 = pd.DataFrame(dict(
id=[12, 34, 56, 78, 90, 13],
name='james steve jack ted eric bob'.split(),
remark='',
))
df2 = pd.DataFrame(dict(
id=[12, 34, 56, 78, 13],
name='james steve jack ted bob'.split(),
remark='',
))
df2.at[2, 'remark'] = 'smart'
Solution
df2.append(df1).drop_duplicates(['id', 'name']).reset_index(drop=True)
id name remark
0 12 james
1 34 steve
2 56 jack smart
3 78 ted
4 13 bob
5 90 eric
Upvotes: 2
Reputation: 323226
To fix your code ...
l=[]
for index, row in df1.iterrows():
id = row['Id']
if sum(df2['Id'].isin([id]))>0:
l.append(id)
l
Out[334]: [0, 1, 2, 3, 4] # those are the row you need to remove
df1.loc[~df1.index.isin(l)]# you remove them by using `~` + .isin
Out[339]:
Id Name
5 5 F
6 6 G
By using pd.concat
pd.concat([df2,df1[~df1.Id.isin(df2.Id)]],axis=0)
Out[337]:
Id Name
0 0 A
1 1 B
2 2 C
3 3 D
4 4 E
5 5 F
6 6 G
Data Input
fake = {'Id' : [0,1,2,3,4,5,6],
'Name' : ['A','B','C','D','E','F','G']}
df1 = pd.DataFrame(fake)
fake = {'Id' : [0,1,2,3,4],
'Name' : ['A','B','C','D','E']}
df2 = pd.DataFrame(fake)
Upvotes: 2
Reputation: 1744
Pandas has several functions available that allow merging and joining different DataFrames. One you could use here is merge
: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
>>>merged = df1.merge(df2, how='left')
id name remark
0 234 james
1 212 steve
2 153 jack smart
3 567 ted
4 432 eric NaN
5 543 bob
If you didn't want the inserted value to be NaN
, you could always use fillna
: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html.
Upvotes: 0