Reputation: 495
I have two dataframes. "df" is my original dataframe with 100000+ values and "df_result" is another that contains only certain columns with certain indexes of df. I have changed the values in "df_result" columns and want to apply back to my original dataframe "df". I have mapped the column names and index of "df_index" to match the right index of "df" but it does not contain every index of "df". (ex, df.index() output is [0,1,2,.....,92808,92809] and df_result.index() output is [23429,23430,32349,42099,45232,.....,91324,91423]). Is there efficient way to put every value in "df_result" to the original "df" which is corespond to same index and columns?. Thank you!
Upvotes: 0
Views: 100
Reputation: 862531
You can use combine_first
:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df_result = pd.DataFrame({'A':list('abc'),
'B':[4,5,4],
'C':[7,9,3],
'D':[5,7,1],
'E':[5,3,6],
'F':list('klo')}, index=[2,4,5])
print (df_result)
A B C D E F
2 a 4 7 5 5 k
4 b 5 9 7 3 l
5 c 4 3 1 6 o
df = df_result.combine_first(df)
print (df)
A B C D E F
0 a 4.0 7.0 1.0 5.0 a
1 b 5.0 8.0 3.0 3.0 a
2 a 4.0 7.0 5.0 5.0 k
3 d 5.0 4.0 7.0 9.0 b
4 b 5.0 9.0 7.0 3.0 l
5 c 4.0 3.0 1.0 6.0 o
Another solution wotking with NaN
s too is join DataFrame
s and remove duplicates rows by indices:
df = df_result.append(df)
df = df[~df.index.duplicated()].sort_index()
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 a 4 7 5 5 k
3 d 5 4 7 9 b
4 b 5 9 7 3 l
5 c 4 3 1 6 o
EDIT:
does this work with np.nan values also? and if df have more columns other then df_result?
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[np.nan,4,8,9,4,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 NaN 1 5 a
1 b 5 4.0 3 3 a
2 c 4 8.0 5 6 a
3 d 5 9.0 7 9 b
4 e 5 4.0 1 2 b
5 f 4 3.0 0 4 b
df_result = pd.DataFrame({'A':list('abc'),
'B':[np.nan,50,40],
'E':[50,30,60],
'F':list('klo')}, index=[2,4,5])
print (df_result)
A B E F
2 a NaN 50 k
4 b 50.0 30 l
5 c 40.0 60 o
You can set df
by indices and columns names with loc
:
df.loc[df_result.index, df_result.columns] = df_result
print (df)
A B C D E F
0 a 4.0 NaN 1 5 a
1 b 5.0 4.0 3 3 a
2 a NaN 8.0 5 50 k
3 d 5.0 9.0 7 9 b
4 b 50.0 4.0 1 30 l
5 c 40.0 3.0 0 60 o
Upvotes: 1
Reputation: 1635
This function should work if you don't have any NA:
df = df.update(df_result)
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html
Upvotes: 0