Reputation: 559
I am trying to find a way to look for the missing data in dataframe based on the data in my list. Each interface must have this 5 sub-interfaces.
sub_interface_list = ['1030', '1035', '1039', '1050', '1059']
df = pd.DataFrame({'Device': ['DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA'], 'Interface': ['Eth-Trunk100', 'Eth-Trunk100', 'Eth-Trunk100', 'Eth-Trunk100', 'Eth-Trunk100', 'Eth-Trunk101', 'Eth-Trunk101', 'Eth-Trunk101', 'Eth-Trunk101'], 'Sub_interface': ['1030', '1035', '1039', '1050', '1059', '1030', '1039', '1050', '1059']})
The dataframe looks like this
Device Interface Sub_interface
DeviceA Eth-Trunk100 1030
DeviceA Eth-Trunk100 1035
DeviceA Eth-Trunk100 1039
DeviceA Eth-Trunk100 1050
DeviceA Eth-Trunk100 1059
DeviceA Eth-Trunk101 1030
DeviceA Eth-Trunk101 1039
DeviceA Eth-Trunk101 1050
DeviceA Eth-Trunk101 1059
From the list we can see that Eth-Trunk101 is missing the sub_interface of 1035, and I would like to insert the 1035 into the last row of each interface. I know using dataframe.iterrows() and search for the missing element is easy, but is there any way in pandas that can be used without using the for loop?
** This is a test data set, my data is much bigger and using iteration will be very time consuming.
Upvotes: 3
Views: 121
Reputation: 28644
You could use the complete function from pyjanitor to expose the missing values:
df.complete(['Interface', 'Sub_interface'])
Interface Sub_interface Device
0 Eth-Trunk100 1030 DeviceA
1 Eth-Trunk100 1035 DeviceA
2 Eth-Trunk100 1039 DeviceA
3 Eth-Trunk100 1050 DeviceA
4 Eth-Trunk100 1059 DeviceA
5 Eth-Trunk101 1030 DeviceA
6 Eth-Trunk101 1035 NaN
7 Eth-Trunk101 1039 DeviceA
8 Eth-Trunk101 1050 DeviceA
9 Eth-Trunk101 1059 DeviceA
You can use ffill
to fill the null value:
df.complete(['Interface', 'Sub_interface']).ffill()
If you want to stick solely within Pandas (pyjanitor is a collection of convenient wrappers around Pandas), the solution below works well:
Create unique index of interface
and sub_interface
:
interface = pd.MultiIndex.from_product([df.Interface.unique(),
df.Sub_interface.unique()])
In [456]: interface
Out[456]:
MultiIndex([('Eth-Trunk100', '1030'),
('Eth-Trunk100', '1035'),
('Eth-Trunk100', '1039'),
('Eth-Trunk100', '1050'),
('Eth-Trunk100', '1059'),
('Eth-Trunk101', '1030'),
('Eth-Trunk101', '1035'),
('Eth-Trunk101', '1039'),
('Eth-Trunk101', '1050'),
('Eth-Trunk101', '1059')],
)
Set interface
and sub_interface
as index, reindex with interface
and reset_index:
df.set_index(['Interface', 'Sub_interface']).reindex(interface).reset_index()
Interface Sub_interface Device
0 Eth-Trunk100 1030 DeviceA
1 Eth-Trunk100 1035 DeviceA
2 Eth-Trunk100 1039 DeviceA
3 Eth-Trunk100 1050 DeviceA
4 Eth-Trunk100 1059 DeviceA
5 Eth-Trunk101 1030 DeviceA
6 Eth-Trunk101 1035 NaN
7 Eth-Trunk101 1039 DeviceA
8 Eth-Trunk101 1050 DeviceA
9 Eth-Trunk101 1059 DeviceA
Reindexing here works because the combination of interface
and sub_interface
is unique; if it is not unique, then a merge on outer
is a better step; complete takes care of these checks in the background.
Also be wary of setting index with nulls; the Pandas docs suggests to avoid it - Although, so far with reindexing I have not noticed any issues.
You could also use unstack/stack, since the index is unique:
df.set_index(['Interface', 'Sub_interface']).unstack().stack(dropna = False).reset_index()
Interface Sub_interface Device
0 Eth-Trunk100 1030 DeviceA
1 Eth-Trunk100 1035 DeviceA
2 Eth-Trunk100 1039 DeviceA
3 Eth-Trunk100 1050 DeviceA
4 Eth-Trunk100 1059 DeviceA
5 Eth-Trunk101 1030 DeviceA
6 Eth-Trunk101 1035 NaN
7 Eth-Trunk101 1039 DeviceA
8 Eth-Trunk101 1050 DeviceA
9 Eth-Trunk101 1059 DeviceA
Upvotes: 4
Reputation: 150735
One way is to pivot then stack:
(df.assign(dummy=1)
.pivot_table(index=['Device','Interface'], columns='Sub_interface',
values='dummy', fill_value=1)
.reindex(sub_interface_list, fill_value=1, axis=1)
.stack().reset_index(name='dummy')
.drop('dummy', axis=1)
)
Output:
Device Interface Sub_interface
0 DeviceA Eth-Trunk100 1030
1 DeviceA Eth-Trunk100 1035
2 DeviceA Eth-Trunk100 1039
3 DeviceA Eth-Trunk100 1050
4 DeviceA Eth-Trunk100 1059
5 DeviceA Eth-Trunk101 1030
6 DeviceA Eth-Trunk101 1035
7 DeviceA Eth-Trunk101 1039
8 DeviceA Eth-Trunk101 1050
9 DeviceA Eth-Trunk101 1059
Upvotes: 1