Prachi
Prachi

Reputation: 554

How to replace data in one pandas df by the data of another one?

Want to replace some rows of some columns in a bigger pandas df by data in a smaller pandas df. The column names are same in both. Tried using combine_first but it only updates the null values.

For example lets say df1.shape is 100, 25 and df2.shape is 10,5 df1

A   B     C    D     E     F  G ...Z  Y  Z
1  abc  10.20  0   pd.NaT

df2

A   B      C    D  E
1  abc  15.20   1  10

Now after replacing df1 should look like:

A   B     C    D   E   F  G ...Z  Y  Z
1  abc  15.20  1   10 ...

To replace values in df1 the condition is where df1.A = df2.A and df1.B = df2.B

How can it be achieved in the most pythonic way? Any help will be appreciated.

Upvotes: 1

Views: 102

Answers (2)

gv12
gv12

Reputation: 21

You could play with Multiindex. First let us create those dataframe that you are working with:

cols = pd.Index(list(ascii_uppercase))
vals = np.arange(100*len(cols)).reshape(100, len(cols))

df = pd.DataFrame(vals, columns=cols)
df1 = pd.DataFrame(vals[:10,:5], columns=cols[:5])

Then transform A and B in indices:

df = df.set_index(["A","B"])
df1 = df1.set_index(["A","B"])*1.5 # multiply just to make the other values different

df.loc[df1.index, df1.columns] = df1
df = df.reset_index()

Upvotes: 0

Utpal Dutt
Utpal Dutt

Reputation: 403

Don't know I really understood your question does this solves your problem ?

df1 = pd.DataFrame(data={'A':[1],'B':[2],'C':[3],'D':[4]})
df2 = pd.DataFrame(data={'A':[1],'B':[2],'C':[5],'D':[6]})

new_df=pd.concat([df1,df2]).drop_duplicates(['A','B'],keep='last')
print(new_df)

output:

   A  B  C  D
0  1  2  5  6

Upvotes: 1

Related Questions