Leb
Leb

Reputation: 15953

Comparing and replacing values inside DataFrames

I have two lists, one being the main list used as the "key" and the other is the one being updated due to missing information.

main_df:

+---------+--------+--------+--------+--------+
| ID      | value1 | value2 | value3 | value4 |
+=========+========+========+========+========+
| 9845213 | 1      | 11     | a      | aa     |
+---------+--------+--------+--------+--------+
| 545167  | 2      | 22     | b      | bb     |
+---------+--------+--------+--------+--------+
| 132498  | 3      | 33     | c      | cc     |
+---------+--------+--------+--------+--------+
| 89465   | 4      | 44     | d      | dd     |
+---------+--------+--------+--------+--------+
| 871564  | 5      | 55     | e      | ee     |
+---------+--------+--------+--------+--------+
| 646879  | 6      | 66     | f      | ff     |
+---------+--------+--------+--------+--------+
...

data_df:

+----------+--------+--------+--------+--------+--------+
| ID       | value1 | value2 | value3 | value4 | value5 |
+==========+========+========+========+========+========+
| 4968712  | NaN    | NaN    | a      | aa     | a1     |
+----------+--------+--------+--------+--------+--------+
| 21347987 | 2      | 22     | b      | bb     | b2     |
+----------+--------+--------+--------+--------+--------+
| 4168512  | NaN    | NaN    | c      | cc     | c3     |
+----------+--------+--------+--------+--------+--------+
| 31468612 | 4      | 44     | d      | dd     | d4     |
+----------+--------+--------+--------+--------+--------+
| 9543213  | 5      | 55     | e      | ee     | e5     |
+----------+--------+--------+--------+--------+--------+
| 324798   | NaN    | NaN    | f      | ff     | f6     |
+----------+--------+--------+--------+--------+--------+

What I'm trying to do is use value3 and value4 from the main_df in order to update only values1 and values2 in data_df.

None of the Merge, join, and concatenate would work for me since I need to keep the two files separate.

I tried using the Working with missing data and .replace() but I'm not sure how to properly extract the values needed from main_df to replace NaN values in data_df.

Upvotes: 0

Views: 61

Answers (1)

Jianxun Li
Jianxun Li

Reputation: 24742

Try the following code which uses update() function.

import numpy as np
import pandas as pd

main_df = pd.read_csv('/home/Jian/Downloads/main.txt', sep='|')
data_df = pd.read_csv('/home/Jian/Downloads/data.csv')

Out[229]: 
      ID      LAT     LONG                        CITY STATE                 TIME
0  12345      NaN      NaN           Cape Hinchinbrook    AK  2015-06-27 21:03:19
1  12346      NaN      NaN              Delenia Island    AK  2015-06-27 21:03:19
2  12347  29.7401 -95.4636                     Houston    TX  2015-06-27 21:03:19
3  12348  41.7132 -83.7032                    Sylvania    OH  2015-06-27 21:03:19
4  12349      NaN      NaN                  Alaskaland    AK  2015-06-27 21:03:19
5  12350      NaN      NaN  Badger Road Baptist Church    AK  2015-06-27 21:03:19

main_df_part = main_df[['PRIM_LAT_DEC', 'PRIM_LONG_DEC','FEATURE_NAME', 'STATE_ALPHA']]
main_df_part.columns = ['LAT', 'LONG', 'CITY', 'STATE']
main_df_part = main_df_part.set_index(['CITY', 'STATE'])

Out[230]: 
                                      LAT      LONG
CITY                       STATE                   
Pacific Ocean              CA     39.3103 -123.8447
Cape Hinchinbrook          AK     60.2347 -146.6417
Delenia Island             AK     60.3394 -148.1383
Alaskaland                 AK     64.8394 -147.7700
Badger Road Baptist Church AK     64.8167 -147.5661
Barnes Creek               AK     65.0014 -147.2939
Barnette Magnet School     AK     64.8383 -147.7300
Bentley Park               AK     64.8364 -147.6942

data_df = data_df.set_index(['CITY', 'STATE'])

Out[233]: 
                                     ID      LAT     LONG                 TIME
CITY                       STATE                                              
Cape Hinchinbrook          AK     12345      NaN      NaN  2015-06-27 21:03:19
Delenia Island             AK     12346      NaN      NaN  2015-06-27 21:03:19
Houston                    TX     12347  29.7401 -95.4636  2015-06-27 21:03:19
Sylvania                   OH     12348  41.7132 -83.7032  2015-06-27 21:03:19
Alaskaland                 AK     12349      NaN      NaN  2015-06-27 21:03:19
Badger Road Baptist Church AK     12350      NaN      NaN  2015-06-27 21:03:19


data_df.update(main_df_part)

Out[235]: 
                                     ID      LAT      LONG                 TIME
CITY                       STATE                                               
Cape Hinchinbrook          AK     12345  60.2347 -146.6417  2015-06-27 21:03:19
Delenia Island             AK     12346  60.3394 -148.1383  2015-06-27 21:03:19
Houston                    TX     12347  29.7401  -95.4636  2015-06-27 21:03:19
Sylvania                   OH     12348  41.7132  -83.7032  2015-06-27 21:03:19
Alaskaland                 AK     12349  64.8394 -147.7700  2015-06-27 21:03:19
Badger Road Baptist Church AK     12350  64.8167 -147.5661  2015-06-27 21:03:19

Upvotes: 2

Related Questions