Reputation: 59
I have a column in a dataframe with different strings.
Additional Information |
IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn
IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray
Undefined System Error
Specific groupname=CUSTGR1
IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck
What I want to do is to create new columns, IP Address and MAC Address with the corresponding values from the column above.
So that the expected output looks like this:
Additional Information |IP Address | MAC Address |
IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, USER=kwfinn |192.168.1.1 |00:0a:95:9d:68:16|
IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, USER=wattray|192.168.0.1 |00:0a:95:9d:68:17|
Undefined System Error | | |
Specific groupname=CUSTGR1 | | |
IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, USER=stwnck |192.168.1.2 |00:1B:44:11:3A:B7|
The problem is, that I cannot deal with the rows that does not contain IP and MAC. I tried splitting using np.where as well as finding partial matches but didn't succeed.
Upvotes: 1
Views: 68
Reputation: 862471
Idea is use list comprehension with filtering if not missing value or None and exist ,
and =
, pass to DataFrame
constructor and last use DataFrame.join
to original:
L = [dict(y.split("=") for y in v.split(", "))
if pd.notna(v) and ('=' in v) and (', ' in v)
else {}
for v in df['Additional Information']]
df1 = pd.DataFrame(L, index=df.index)
print (df1)
IP MAC ADDR USER
0 192.168.1.1 00:0a:95:9d:68:16 kwfinn
1 192.168.0.1 00:0a:95:9d:68:17 wattray
2 NaN NaN NaN
3 NaN NaN NaN
4 192.168.1.2 00:1B:44:11:3A:B7 stwnck
df = df.join(df1[['IP','MAC ADDR']])
print (df)
Additional Information IP \
0 IP=192.168.1.1, MAC ADDR=00:0a:95:9d:68:16, US... 192.168.1.1
1 IP=192.168.0.1, MAC ADDR=00:0a:95:9d:68:17, US... 192.168.0.1
2 Undefined System Error NaN
3 Specific groupname=CUSTGR1 NaN
4 IP=192.168.1.2, MAC ADDR=00:1B:44:11:3A:B7, US... 192.168.1.2
MAC ADDR
0 00:0a:95:9d:68:16
1 00:0a:95:9d:68:17
2 NaN
3 NaN
4 00:1B:44:11:3A:B7
Upvotes: 3