Reputation: 193
This question could be a little tricky...
I have a function that labels a dataframe based on some values in its columns. The function receives as parameter, a dataframe and a dictionary. This dictionary has key-value pairs that indicate the columns(key) and the value that it have to have to be labeled with certain number. For example:
{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1}
when the column "ip_src" of the dataframe have the value "192.168.84.129" and the column "ip_dst" have the value "192.168.84.128", that rows have to be labeled whit a '1'. The thing is that those conditions may vary, so I want to generalize the code, so I could pass several other conditions as:
{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}
and so on.
I started with:
def labeling(df, crit):
for dic in crit:
lbl = dic["label"]
del dic["label"]
conds = []
pairs = len(dic)
for key in dic:
conds.append((df[key] == dic[key]))
But I get stuck in the last line, because I can't figure how to concatenate the conditions and then apply them as: df[conds] = lbl
Thanks!
Edit:
Input:
index ip_src ip_dst ip_proto frame_time_delta \
0 0 192.168.84.129 192.168.84.128 17.0 0.000000
1 1 31.13.94.53 192.168.1.101 17.0 0.006656
2 2 192.168.1.101 31.13.94.53 17.0 0.012948
payload_size src_port dst_port flow_dir
0 172.0 52165.0 40002.0 1
1 176.0 40002.0 52165.0 0
2 172.0 52165.0 19305.0 1
Output:
ip_src ip_dst ip_proto frame_time_delta \
0 192.168.84.129 192.168.84.128 17.0 0.000000
1 31.13.94.53 192.168.1.101 17.0 0.006656
2 192.168.1.101 31.13.94.53 17.0 0.012948
payload_size src_port dst_port flow_dir label
0 172.0 52165.0 35456.0 1 1
1 176.0 40002.0 52165.0 0 0
2 172.0 52165.0 19305.0 1 4
Possible cases:
l_crit = [{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},
{"ip_src": "192.168.1.100", "ip_dst": "192.168.1.105", "dst_port": 9999, "label": 1},
{"ip_src": "192.168.1.101", "ip_dst": "104.44.195.76", "label": 2},
{"ip_src": "192.168.1.101", "ip_dst": "31.13.94.53", "ip_proto": 17, "label": 3},
{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]
Upvotes: 0
Views: 79
Reputation: 11192
try this,
crit=[{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]
dictionary={}
for dic in crit:
dictionary[dic['ip_src']]=dic['label']
df['label']=df['ip_src'].map(dictionary).fillna(0)
Input:
ip_src ip_dst ip_proto frame_time_delta payload_size \
0 192.168.84.129 192.168.84.128 17.0 0.000000 172.0
1 31.13.94.53 192.168.1.101 17.0 0.006656 176.0
2 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
src_port dst_port flow_dir
0 52165.0 35456.0 1
1 40002.0 52165.0 0
2 52165.0 19305.0 1
Output:
ip_src ip_dst ip_proto frame_time_delta payload_size \
0 192.168.84.129 192.168.84.128 17.0 0.000000 172.0
1 31.13.94.53 192.168.1.101 17.0 0.006656 176.0
2 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
src_port dst_port flow_dir label
0 52165.0 35456.0 1 1.0
1 40002.0 52165.0 0 0.0
2 52165.0 19305.0 1 4.0
Edit 1:
l_crit = [{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},
{"ip_src": "192.168.1.100", "ip_dst": "192.168.1.105", "dst_port": 9999, "label": 1},
{"ip_src": "192.168.1.101", "ip_dst": "104.44.195.76", "label": 2},
{"ip_src": "192.168.1.101", "ip_dst": "31.13.94.53", "ip_proto": 17, "label": 3},
{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]
temp=pd.DataFrame()
l=[]
v=[]
for dic in l_crit:
l.append(dic['ip_src'])
v.append(dic['label'])
temp['ip_src']=l
temp['label']=v
df=pd.merge(df,temp,how='left',on=['ip_src'])
df['label']=df['label'].fillna(0)
Input:
ip_src ip_dst ip_proto frame_time_delta payload_size \
0 192.168.84.129 192.168.84.128 17.0 0.000000 172.0
1 31.13.94.53 192.168.1.101 17.0 0.006656 176.0
2 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
src_port dst_port flow_dir
0 52165.0 35456.0 1
1 40002.0 52165.0 0
2 52165.0 19305.0 1
Output:
ip_src ip_dst ip_proto frame_time_delta payload_size \
0 192.168.84.129 192.168.84.128 17.0 0.000000 172.0
1 31.13.94.53 192.168.1.101 17.0 0.006656 176.0
2 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
3 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
4 192.168.1.101 31.13.94.53 17.0 0.012948 72.0
src_port dst_port flow_dir label
0 52165.0 35456.0 1 1.0
1 40002.0 52165.0 0 0.0
2 52165.0 19305.0 1 2.0
3 52165.0 19305.0 1 3.0
4 52165.0 19305.0 1 4.0
Upvotes: 1