sooaran
sooaran

Reputation: 193

Concatenate conditions

This question could be a little tricky...

I have a function that labels a dataframe based on some values in its columns. The function receives as parameter, a dataframe and a dictionary. This dictionary has key-value pairs that indicate the columns(key) and the value that it have to have to be labeled with certain number. For example:

{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1}

when the column "ip_src" of the dataframe have the value "192.168.84.129" and the column "ip_dst" have the value "192.168.84.128", that rows have to be labeled whit a '1'. The thing is that those conditions may vary, so I want to generalize the code, so I could pass several other conditions as:

{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}

and so on.

I started with:

def labeling(df, crit):
    for dic in crit:
        lbl = dic["label"]
        del dic["label"]
        conds = []
        pairs = len(dic)
        for key in dic:
            conds.append((df[key] == dic[key])) 

But I get stuck in the last line, because I can't figure how to concatenate the conditions and then apply them as: df[conds] = lbl

Thanks!

Edit:

Input:

   index         ip_src         ip_dst  ip_proto  frame_time_delta  \
0      0  192.168.84.129 192.168.84.128      17.0          0.000000   
1      1    31.13.94.53  192.168.1.101      17.0          0.006656   
2      2  192.168.1.101    31.13.94.53      17.0          0.012948   

   payload_size  src_port  dst_port  flow_dir  
0         172.0   52165.0   40002.0         1  
1         176.0   40002.0   52165.0         0  
2         172.0   52165.0   19305.0         1 

Output:

       ip_src         ip_dst       ip_proto  frame_time_delta  \
0  192.168.84.129 192.168.84.128     17.0          0.000000   
1    31.13.94.53  192.168.1.101      17.0          0.006656   
2  192.168.1.101    31.13.94.53      17.0          0.012948   

   payload_size  src_port  dst_port  flow_dir   label
0         172.0   52165.0   35456.0         1    1 
1         176.0   40002.0   52165.0         0    0
2         172.0   52165.0   19305.0         1    4

Possible cases:

l_crit = [{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},
          {"ip_src": "192.168.1.100", "ip_dst": "192.168.1.105", "dst_port": 9999, "label": 1},
          {"ip_src": "192.168.1.101", "ip_dst": "104.44.195.76", "label": 2},
          {"ip_src": "192.168.1.101", "ip_dst": "31.13.94.53", "ip_proto": 17, "label": 3},
          {"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]

Upvotes: 0

Views: 79

Answers (1)

Mohamed Thasin ah
Mohamed Thasin ah

Reputation: 11192

try this,

crit=[{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},{"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]

dictionary={}
for dic in crit:
    dictionary[dic['ip_src']]=dic['label']
df['label']=df['ip_src'].map(dictionary).fillna(0)

Input:

           ip_src          ip_dst  ip_proto  frame_time_delta  payload_size  \
0  192.168.84.129  192.168.84.128      17.0          0.000000         172.0   
1     31.13.94.53   192.168.1.101      17.0          0.006656         176.0   
2   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   

   src_port  dst_port  flow_dir  
0   52165.0   35456.0         1  
1   40002.0   52165.0         0  
2   52165.0   19305.0         1

Output:

           ip_src          ip_dst  ip_proto  frame_time_delta  payload_size  \
0  192.168.84.129  192.168.84.128      17.0          0.000000         172.0   
1     31.13.94.53   192.168.1.101      17.0          0.006656         176.0   
2   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   

   src_port  dst_port  flow_dir  label  
0   52165.0   35456.0         1    1.0  
1   40002.0   52165.0         0    0.0  
2   52165.0   19305.0         1    4.0 

Edit 1:

l_crit = [{"ip_src": "192.168.84.129", "ip_dst": "192.168.84.128", "label": 1},
          {"ip_src": "192.168.1.100", "ip_dst": "192.168.1.105", "dst_port": 9999, "label": 1},
          {"ip_src": "192.168.1.101", "ip_dst": "104.44.195.76", "label": 2},
          {"ip_src": "192.168.1.101", "ip_dst": "31.13.94.53", "ip_proto": 17, "label": 3},
          {"ip_src": "192.168.1.101", "dst_port": 19305, "label": 4}]


temp=pd.DataFrame()

l=[]
v=[]
for dic in l_crit:
    l.append(dic['ip_src'])
    v.append(dic['label'])
temp['ip_src']=l
temp['label']=v

df=pd.merge(df,temp,how='left',on=['ip_src'])
df['label']=df['label'].fillna(0)

Input:

          ip_src          ip_dst  ip_proto  frame_time_delta  payload_size  \
0  192.168.84.129  192.168.84.128      17.0          0.000000         172.0   
1     31.13.94.53   192.168.1.101      17.0          0.006656         176.0   
2   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   

   src_port  dst_port  flow_dir  
0   52165.0   35456.0         1  
1   40002.0   52165.0         0  
2   52165.0   19305.0         1

Output:

           ip_src          ip_dst  ip_proto  frame_time_delta  payload_size  \
0  192.168.84.129  192.168.84.128      17.0          0.000000         172.0   
1     31.13.94.53   192.168.1.101      17.0          0.006656         176.0   
2   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   
3   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   
4   192.168.1.101     31.13.94.53      17.0          0.012948          72.0   

   src_port  dst_port  flow_dir  label  
0   52165.0   35456.0         1    1.0  
1   40002.0   52165.0         0    0.0  
2   52165.0   19305.0         1    2.0  
3   52165.0   19305.0         1    3.0  
4   52165.0   19305.0         1    4.0 

Upvotes: 1

Related Questions