Reputation: 15953
I have two lists, one being the main list used as the "key" and the other is the one being updated due to missing information.
main_df:
+---------+--------+--------+--------+--------+
| ID | value1 | value2 | value3 | value4 |
+=========+========+========+========+========+
| 9845213 | 1 | 11 | a | aa |
+---------+--------+--------+--------+--------+
| 545167 | 2 | 22 | b | bb |
+---------+--------+--------+--------+--------+
| 132498 | 3 | 33 | c | cc |
+---------+--------+--------+--------+--------+
| 89465 | 4 | 44 | d | dd |
+---------+--------+--------+--------+--------+
| 871564 | 5 | 55 | e | ee |
+---------+--------+--------+--------+--------+
| 646879 | 6 | 66 | f | ff |
+---------+--------+--------+--------+--------+
...
data_df:
+----------+--------+--------+--------+--------+--------+
| ID | value1 | value2 | value3 | value4 | value5 |
+==========+========+========+========+========+========+
| 4968712 | NaN | NaN | a | aa | a1 |
+----------+--------+--------+--------+--------+--------+
| 21347987 | 2 | 22 | b | bb | b2 |
+----------+--------+--------+--------+--------+--------+
| 4168512 | NaN | NaN | c | cc | c3 |
+----------+--------+--------+--------+--------+--------+
| 31468612 | 4 | 44 | d | dd | d4 |
+----------+--------+--------+--------+--------+--------+
| 9543213 | 5 | 55 | e | ee | e5 |
+----------+--------+--------+--------+--------+--------+
| 324798 | NaN | NaN | f | ff | f6 |
+----------+--------+--------+--------+--------+--------+
What I'm trying to do is use value3
and value4
from the main_df
in order to update only values1
and values2
in data_df
.
None of the Merge, join, and concatenate would work for me since I need to keep the two files separate.
I tried using the Working with missing data and .replace()
but I'm not sure how to properly extract the values needed from main_df
to replace NaN
values in data_df
.
Upvotes: 0
Views: 61
Reputation: 24742
Try the following code which uses update()
function.
import numpy as np
import pandas as pd
main_df = pd.read_csv('/home/Jian/Downloads/main.txt', sep='|')
data_df = pd.read_csv('/home/Jian/Downloads/data.csv')
Out[229]:
ID LAT LONG CITY STATE TIME
0 12345 NaN NaN Cape Hinchinbrook AK 2015-06-27 21:03:19
1 12346 NaN NaN Delenia Island AK 2015-06-27 21:03:19
2 12347 29.7401 -95.4636 Houston TX 2015-06-27 21:03:19
3 12348 41.7132 -83.7032 Sylvania OH 2015-06-27 21:03:19
4 12349 NaN NaN Alaskaland AK 2015-06-27 21:03:19
5 12350 NaN NaN Badger Road Baptist Church AK 2015-06-27 21:03:19
main_df_part = main_df[['PRIM_LAT_DEC', 'PRIM_LONG_DEC','FEATURE_NAME', 'STATE_ALPHA']]
main_df_part.columns = ['LAT', 'LONG', 'CITY', 'STATE']
main_df_part = main_df_part.set_index(['CITY', 'STATE'])
Out[230]:
LAT LONG
CITY STATE
Pacific Ocean CA 39.3103 -123.8447
Cape Hinchinbrook AK 60.2347 -146.6417
Delenia Island AK 60.3394 -148.1383
Alaskaland AK 64.8394 -147.7700
Badger Road Baptist Church AK 64.8167 -147.5661
Barnes Creek AK 65.0014 -147.2939
Barnette Magnet School AK 64.8383 -147.7300
Bentley Park AK 64.8364 -147.6942
data_df = data_df.set_index(['CITY', 'STATE'])
Out[233]:
ID LAT LONG TIME
CITY STATE
Cape Hinchinbrook AK 12345 NaN NaN 2015-06-27 21:03:19
Delenia Island AK 12346 NaN NaN 2015-06-27 21:03:19
Houston TX 12347 29.7401 -95.4636 2015-06-27 21:03:19
Sylvania OH 12348 41.7132 -83.7032 2015-06-27 21:03:19
Alaskaland AK 12349 NaN NaN 2015-06-27 21:03:19
Badger Road Baptist Church AK 12350 NaN NaN 2015-06-27 21:03:19
data_df.update(main_df_part)
Out[235]:
ID LAT LONG TIME
CITY STATE
Cape Hinchinbrook AK 12345 60.2347 -146.6417 2015-06-27 21:03:19
Delenia Island AK 12346 60.3394 -148.1383 2015-06-27 21:03:19
Houston TX 12347 29.7401 -95.4636 2015-06-27 21:03:19
Sylvania OH 12348 41.7132 -83.7032 2015-06-27 21:03:19
Alaskaland AK 12349 64.8394 -147.7700 2015-06-27 21:03:19
Badger Road Baptist Church AK 12350 64.8167 -147.5661 2015-06-27 21:03:19
Upvotes: 2