gmorton
gmorton

Reputation: 1

Comparing floats in Python

I am having trouble comparing floats within conditional statements in python. I have a dataset that looks like this:

         CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates  
0                        7        UNK              4  
1                        3     #NAME?           -243  
2                        1        UNK           -591  
3                        0  Exec Code            -93  
4                        0        UNK            -77 

What I am trying to accomplish is to loop through the CVSS score(type float) and if it is in the range 0<=score<6 then I add a column to that row(Class Number) and make it equal to 1. If it is in the range 6<=score<7.5 then the class number will be 2, and if it is in the range 7.5<=score<10 then the class number will be 3. If done correctly this is what it should look like:

           CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates Class Number  
0                        7        UNK              4            1  
1                        3     #NAME?           -243            3  
2                        1        UNK           -591            2  
3                        0  Exec Code            -93            3  
4                        0        UNK            -77            3 

My code right now looks like this:

data = pd.read_csv('tag_SA.txt', sep='|')
for score in data['CVSS Score']:
    if 0.0 < score < 6.0:
        data["Class Number"] = 1
    elif(6 <= score < 7.5):
        data["Class Number"] = 2
    else:
        data["Class Number"] = 3

and the output I am getting is this:

           CVE-ID CVE Creation Date  Patch Date  CVSS Score  \
0   CVE-2012-6702          6/3/2016    6/7/2016         5.9   
1   CVE-2015-8951         8/15/2016  12/16/2015         7.8   
2   CVE-2015-9016         3/28/2017   8/15/2015         7.0   
3  CVE-2016-10230          3/1/2017  11/28/2016         9.8   
4  CVE-2016-10231          3/1/2017  12/14/2016         7.8   

                                     Bug Description  # of lines added  \
0  Expat, when used in a parser that has not call...                41   
1  Multiple use-after-free vulnerabilities in sou...                10   
2  In blk_mq_tag_to_rq in blk-mq.c in the upstrea...                 3   
3  A remote code execution vulnerability in the Q...                 7   
4  An elevation of privilege vulnerability in the...                 8   

   number of lines removed  Vuln Type  Diff of dates Class Number  
0                        7        UNK              4            3  
1                        3     #NAME?           -243            3  
2                        1        UNK           -591            3  
3                        0  Exec Code            -93            3  
4                        0        UNK            -77            3 

So it is just going to the else statement and considering the other statements to be false. Is there something I am missing with float comparisons in python? Any help would be appreciated

Upvotes: 0

Views: 130

Answers (2)

MaxNoe
MaxNoe

Reputation: 14987

Your problem is not about comparing floats, it is that you are overwriting the whole column of the dataframe when you assign.

You need to set only those rows, where the condition is fulfilled, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html, you should probably also go over https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html.

Using what is documented there:

data = pd.read_csv('tag_SA.txt', sep='|')


data['Class Number'] = 3

mask = (0.0 < data['CVSS Score']) & (data['CVSS Score'] <= 6.0)
data.loc[mask, 'Class Number'] = 1

mask = (6.0 < data['CVSS Score']) & (data['CVSS Score'] <= 7.5)
data.loc[mask, 'Class Number'] = 2

You can also use pandas.cut like this:

max_val = data['CVSS Score'].max()
# codes start at 0, add 1 if needed
data['Class Number'] = pd.cut(data['CVSS Score'], [0, 6, 7.5, max_val]).codes + 1 

Upvotes: 1

Mike Sukmanowsky
Mike Sukmanowsky

Reputation: 4477

Try using the apply method of a Series and assign the result to a new column named Class Number.

In your case, it'll look something like:

data = pd.DataFrame({'CVSS Score': [1, 2, .5, 6.2, 6.3, 9, 19, 6.1, 2, .5]})

def classify_cvss_score(score):
    if 0 < score < 6:
        return 1
    elif 6 <= score <= 7.5:
        return 2
   
    return 3

data['Class Number'] = data['CVSS Score'].apply(classify_cvss_score)

Upvotes: 0

Related Questions