Reputation: 2591
I would like to overwrite an x number of rows from a data frame if a certain column is blank and only overwrite those single rows. my attempts below overwrite all records it seems not just the ones returned in the search
Table 1
>>> route_data
circuit_id circuit_provider circuit_type down errors route site site_id mask next_hop
0 None BOB MPLS False None 10.10.94.0 HORSE 7 255.255.255.0 172.10.1.25
1 None BOB MPLS False None 10.10.82.0 LONDON 8 255.255.255.0 172.10.1.25
2 None BILL MPLS False None 10.10.25.0 BACON 128 255.255.255.0 172.1.1.21
3 None BILL MPLS False None 10.11.0.0 MANC 1 255.255.0.0 NaN
4 None BOB MPLS False None 10.10.66.0 YORK 9 255.255.255.0 172.10.1.25
5 None BOB MPLS False None 10.10.87.0 LIVER 10 255.255.255.0 172.10.1.25
6 None BOB MPLS False None 10.10.120.0 EGGS 11 255.255.255.0 172.10.1.25
Table 2
>>> device_route_data
circuit_id circuit_provider circuit_type down errors route site site_id mask next_hop
0 None BOB MPLS False None 10.10.94.0 HORSE 7 255.255.255.0 172.17.5.1
1 None BOB MPLS False None 10.10.82.0 LONDON 8 255.255.255.0 172.17.5.1
2 None BILL MPLS False None 10.10.25.0 BACON 128 255.255.255.0 172.16.30.10
3 None BILL MPLS False None 10.11.0.0 MANC 1 255.255.0.0 172.16.30.10
4 None BOB MPLS False None 10.10.66.0 YORK 9 255.255.255.0 172.17.5.1
5 None BOB MPLS False None 10.10.87.0 LIVER 10 255.255.255.0 172.17.5.1
6 None BOB MPLS False None 10.10.120.0 EGGS 11 255.255.255.0 172.17.5.1
get all the nan entries and overwrite
route_data.loc[route_data.next_hop.str.match('nan'), route_data.columns] = device_route_data[device_route_data.columns]
just returns the Table 2 data in full
circuit_id circuit_provider circuit_type down errors route site site_id mask next_hop
0 None BOB MPLS False None 10.10.94.0 HORSE 7 255.255.255.0 172.17.5.1
1 None BOB MPLS False None 10.10.82.0 LONDON 8 255.255.255.0 172.17.5.1
2 None BILL MPLS False None 10.10.25.0 BACON 128 255.255.255.0 172.16.30.10
3 None BILL MPLS False None 10.11.0.0 MANC 1 255.255.0.0 172.16.30.10
4 None BOB MPLS False None 10.10.66.0 YORK 9 255.255.255.0 172.17.5.1
5 None BOB MPLS False None 10.10.87.0 LIVER 10 255.255.255.0 172.17.5.1
6 None BOB MPLS False None 10.10.120.0 EGGS 11 255.255.255.0 172.17.5.1
using
route_data.loc[route_data.next_hop.str.match('nan'), route_data.columns]
Succesfully gets the Nan records
circuit_id circuit_provider circuit_type down errors route site site_id mask next_hop
3 None BILL MPLS False None 10.11.0.0 MANC 1 255.255.0.0 NaN
this is the only record i want to over write though, i wish all others to remain as they are, does anyone know what I am missing?
Thanks
EDIT:
I'm trying to do the same thing on circuit_type but I am getting blank results. does the .isnull() check work on None Types also?
sample:
circuit_id circuit_provider circuit_type down errors route site site_id mask next_hop
0 None BOB MPLS False None 10.10.94.0 HORSE 7 255.255.255.0 172.17.5.1
1 None BOB MPLS False None 10.10.82.0 LONDON 8 255.255.255.0 172.17.5.1
2 None BILL MPLS False None 10.10.25.0 BACON 128 255.255.255.0 172.16.30.10
3 None BILL MPLS False None 10.11.0.0 MANC 1 255.255.0.0 172.16.30.10
4 None BOB MPLS False None 10.10.66.0 YORK 9 255.255.255.0 172.17.5.1
5 None None None False None 10.10.87.0 LIVER 10 255.255.255.0 172.17.5.1
6 None BOB MPLS False None 10.10.120.0 EGGS 11 255.255.255.0 172.17.5.1
output
>>> route_data.loc[route_data.circuit_type.isnull(), :]
Empty DataFrame
Columns: [circuit_id, circuit_provider, circuit_type, down, errors, route, site, site_id, mask, next_hop]
Index: []
>>>
EDIT 2: further test, this works for finding the rows
route_data.loc[route_data.circuit_type.str.contains("None"), :]
however when I try to test and overwrite only the rows with circuit type to none as per the below, it just overwrites all the rows, so instead of the original table with the None rows altered, I just get the new table with none of the original data
f = route_data.loc[route_data.circuit_type.str.contains("None"), :] = device_route_data
Upvotes: 1
Views: 606
Reputation: 11381
The values of next_hop
that you're trying to select aren't the string "nan"
, but a special value known as "not a number" or NaN
(note the mixed capitalization). Pandas has convenient functions for dealing with NaN
and other null values, like isnull()
:
df1.loc[df.some_column.isnull(), :] = df2
With your names:
route_data.loc[route_data.next_hop.isnull(), :] = device_route_data
isnull()
isolates the rows with NaN
values in your column of interest. Because you're using .loc[]
, you can select all the columns using :
, without having to specify them manually. And you don't need to select all the columns from the second dataframe either - it will use them all by default.
This question and answer is similar to this one: Pandas replace all items in a row with NaN if one value is NaN.
Upvotes: 2