EJ Kang
EJ Kang

Reputation: 495

Python DataFrame: replacing values from DataFrame to other DataFrame with same index and columns

I have two dataframes. "df" is my original dataframe with 100000+ values and "df_result" is another that contains only certain columns with certain indexes of df. I have changed the values in "df_result" columns and want to apply back to my original dataframe "df". I have mapped the column names and index of "df_index" to match the right index of "df" but it does not contain every index of "df". (ex, df.index() output is [0,1,2,.....,92808,92809] and df_result.index() output is [23429,23430,32349,42099,45232,.....,91324,91423]). Is there efficient way to put every value in "df_result" to the original "df" which is corespond to same index and columns?. Thank you!

Upvotes: 0

Views: 100

Answers (2)

jezrael
jezrael

Reputation: 862531

You can use combine_first:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

df_result = pd.DataFrame({'A':list('abc'),
                   'B':[4,5,4],
                   'C':[7,9,3],
                   'D':[5,7,1],
                   'E':[5,3,6],
                   'F':list('klo')}, index=[2,4,5])

print (df_result)
   A  B  C  D  E  F
2  a  4  7  5  5  k
4  b  5  9  7  3  l
5  c  4  3  1  6  o

df = df_result.combine_first(df)
print (df)
   A    B    C    D    E  F
0  a  4.0  7.0  1.0  5.0  a
1  b  5.0  8.0  3.0  3.0  a
2  a  4.0  7.0  5.0  5.0  k
3  d  5.0  4.0  7.0  9.0  b
4  b  5.0  9.0  7.0  3.0  l
5  c  4.0  3.0  1.0  6.0  o

Another solution wotking with NaNs too is join DataFrames and remove duplicates rows by indices:

df = df_result.append(df)
df = df[~df.index.duplicated()].sort_index()
print (df)

   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  a  4  7  5  5  k
3  d  5  4  7  9  b
4  b  5  9  7  3  l
5  c  4  3  1  6  o

EDIT:

does this work with np.nan values also? and if df have more columns other then df_result?

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[np.nan,4,8,9,4,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)
   A  B    C  D  E  F
0  a  4  NaN  1  5  a
1  b  5  4.0  3  3  a
2  c  4  8.0  5  6  a
3  d  5  9.0  7  9  b
4  e  5  4.0  1  2  b
5  f  4  3.0  0  4  b

df_result = pd.DataFrame({'A':list('abc'),
                   'B':[np.nan,50,40],
                   'E':[50,30,60],
                   'F':list('klo')}, index=[2,4,5])

print (df_result)
   A     B   E  F
2  a   NaN  50  k
4  b  50.0  30  l
5  c  40.0  60  o

You can set df by indices and columns names with loc:

df.loc[df_result.index, df_result.columns] = df_result
print (df)
   A     B    C  D   E  F
0  a   4.0  NaN  1   5  a
1  b   5.0  4.0  3   3  a
2  a   NaN  8.0  5  50  k
3  d   5.0  9.0  7   9  b
4  b  50.0  4.0  1  30  l
5  c  40.0  3.0  0  60  o

Upvotes: 1

hanego
hanego

Reputation: 1635

This function should work if you don't have any NA:

df = df.update(df_result)

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html

Upvotes: 0

Related Questions