AlexW
AlexW

Reputation: 2591

Python Pandas - overwrite x rows from another data frame conditionally

I would like to overwrite an x number of rows from a data frame if a certain column is blank and only overwrite those single rows. my attempts below overwrite all records it seems not just the ones returned in the search

Table 1

>>> route_data
   circuit_id circuit_provider circuit_type   down errors        route            site  site_id           mask       next_hop
0        None              BOB         MPLS  False   None   10.10.94.0           HORSE        7  255.255.255.0    172.10.1.25
1        None              BOB         MPLS  False   None   10.10.82.0          LONDON        8  255.255.255.0    172.10.1.25
2        None             BILL         MPLS  False   None   10.10.25.0           BACON      128  255.255.255.0     172.1.1.21
3        None             BILL         MPLS  False   None    10.11.0.0            MANC        1    255.255.0.0            NaN
4        None              BOB         MPLS  False   None   10.10.66.0            YORK        9  255.255.255.0    172.10.1.25
5        None              BOB         MPLS  False   None   10.10.87.0           LIVER       10  255.255.255.0    172.10.1.25
6        None              BOB         MPLS  False   None  10.10.120.0            EGGS       11  255.255.255.0    172.10.1.25

Table 2

>>> device_route_data
   circuit_id circuit_provider circuit_type   down errors        route            site  site_id           mask       next_hop
0        None              BOB         MPLS  False   None   10.10.94.0           HORSE        7  255.255.255.0     172.17.5.1
1        None              BOB         MPLS  False   None   10.10.82.0          LONDON        8  255.255.255.0     172.17.5.1
2        None             BILL         MPLS  False   None   10.10.25.0           BACON      128  255.255.255.0   172.16.30.10
3        None             BILL         MPLS  False   None    10.11.0.0            MANC        1    255.255.0.0   172.16.30.10
4        None              BOB         MPLS  False   None   10.10.66.0            YORK        9  255.255.255.0     172.17.5.1
5        None              BOB         MPLS  False   None   10.10.87.0           LIVER       10  255.255.255.0     172.17.5.1
6        None              BOB         MPLS  False   None  10.10.120.0            EGGS       11  255.255.255.0     172.17.5.1

get all the nan entries and overwrite

route_data.loc[route_data.next_hop.str.match('nan'), route_data.columns] = device_route_data[device_route_data.columns]

just returns the Table 2 data in full

   circuit_id circuit_provider circuit_type   down errors        route            site  site_id           mask       next_hop
0        None              BOB         MPLS  False   None   10.10.94.0           HORSE        7  255.255.255.0     172.17.5.1
1        None              BOB         MPLS  False   None   10.10.82.0          LONDON        8  255.255.255.0     172.17.5.1
2        None             BILL         MPLS  False   None   10.10.25.0           BACON      128  255.255.255.0   172.16.30.10
3        None             BILL         MPLS  False   None    10.11.0.0            MANC        1    255.255.0.0   172.16.30.10
4        None              BOB         MPLS  False   None   10.10.66.0            YORK        9  255.255.255.0     172.17.5.1
5        None              BOB         MPLS  False   None   10.10.87.0           LIVER       10  255.255.255.0     172.17.5.1
6        None              BOB         MPLS  False   None  10.10.120.0            EGGS       11  255.255.255.0     172.17.5.1    

using

route_data.loc[route_data.next_hop.str.match('nan'), route_data.columns]

Succesfully gets the Nan records

   circuit_id circuit_provider circuit_type   down errors        route            site  site_id           mask       next_hop
3        None             BILL         MPLS  False   None    10.11.0.0            MANC        1    255.255.0.0            NaN

this is the only record i want to over write though, i wish all others to remain as they are, does anyone know what I am missing?

Thanks

EDIT:

I'm trying to do the same thing on circuit_type but I am getting blank results. does the .isnull() check work on None Types also?

sample:

    circuit_id circuit_provider circuit_type   down errors        route            site  site_id           mask      next_hop
0        None              BOB         MPLS  False   None   10.10.94.0           HORSE        7  255.255.255.0     172.17.5.1
1        None              BOB         MPLS  False   None   10.10.82.0          LONDON        8  255.255.255.0     172.17.5.1
2        None             BILL         MPLS  False   None   10.10.25.0           BACON      128  255.255.255.0   172.16.30.10
3        None             BILL         MPLS  False   None    10.11.0.0            MANC        1    255.255.0.0   172.16.30.10
4        None              BOB         MPLS  False   None   10.10.66.0            YORK        9  255.255.255.0     172.17.5.1
5        None              None        None  False   None   10.10.87.0           LIVER       10  255.255.255.0     172.17.5.1
6        None              BOB         MPLS  False   None  10.10.120.0            EGGS       11  255.255.255.0     172.17.5.1 

output

>>> route_data.loc[route_data.circuit_type.isnull(), :]
Empty DataFrame
Columns: [circuit_id, circuit_provider, circuit_type, down, errors, route, site, site_id, mask, next_hop]
Index: []
>>>

EDIT 2: further test, this works for finding the rows

route_data.loc[route_data.circuit_type.str.contains("None"), :]

however when I try to test and overwrite only the rows with circuit type to none as per the below, it just overwrites all the rows, so instead of the original table with the None rows altered, I just get the new table with none of the original data

f = route_data.loc[route_data.circuit_type.str.contains("None"), :] = device_route_data

Upvotes: 1

Views: 606

Answers (1)

ASGM
ASGM

Reputation: 11381

The values of next_hop that you're trying to select aren't the string "nan", but a special value known as "not a number" or NaN (note the mixed capitalization). Pandas has convenient functions for dealing with NaN and other null values, like isnull():

df1.loc[df.some_column.isnull(), :] = df2

With your names:

route_data.loc[route_data.next_hop.isnull(), :] = device_route_data

isnull() isolates the rows with NaN values in your column of interest. Because you're using .loc[], you can select all the columns using :, without having to specify them manually. And you don't need to select all the columns from the second dataframe either - it will use them all by default.

This question and answer is similar to this one: Pandas replace all items in a row with NaN if one value is NaN.

Upvotes: 2

Related Questions