Reputation: 1
I am having trouble comparing floats within conditional statements in python. I have a dataset that looks like this:
CVE-ID CVE Creation Date Patch Date CVSS Score \
0 CVE-2012-6702 6/3/2016 6/7/2016 5.9
1 CVE-2015-8951 8/15/2016 12/16/2015 7.8
2 CVE-2015-9016 3/28/2017 8/15/2015 7.0
3 CVE-2016-10230 3/1/2017 11/28/2016 9.8
4 CVE-2016-10231 3/1/2017 12/14/2016 7.8
Bug Description # of lines added \
0 Expat, when used in a parser that has not call... 41
1 Multiple use-after-free vulnerabilities in sou... 10
2 In blk_mq_tag_to_rq in blk-mq.c in the upstrea... 3
3 A remote code execution vulnerability in the Q... 7
4 An elevation of privilege vulnerability in the... 8
number of lines removed Vuln Type Diff of dates
0 7 UNK 4
1 3 #NAME? -243
2 1 UNK -591
3 0 Exec Code -93
4 0 UNK -77
What I am trying to accomplish is to loop through the CVSS score(type float) and if it is in the range 0<=score<6 then I add a column to that row(Class Number) and make it equal to 1. If it is in the range 6<=score<7.5 then the class number will be 2, and if it is in the range 7.5<=score<10 then the class number will be 3. If done correctly this is what it should look like:
CVE-ID CVE Creation Date Patch Date CVSS Score \
0 CVE-2012-6702 6/3/2016 6/7/2016 5.9
1 CVE-2015-8951 8/15/2016 12/16/2015 7.8
2 CVE-2015-9016 3/28/2017 8/15/2015 7.0
3 CVE-2016-10230 3/1/2017 11/28/2016 9.8
4 CVE-2016-10231 3/1/2017 12/14/2016 7.8
Bug Description # of lines added \
0 Expat, when used in a parser that has not call... 41
1 Multiple use-after-free vulnerabilities in sou... 10
2 In blk_mq_tag_to_rq in blk-mq.c in the upstrea... 3
3 A remote code execution vulnerability in the Q... 7
4 An elevation of privilege vulnerability in the... 8
number of lines removed Vuln Type Diff of dates Class Number
0 7 UNK 4 1
1 3 #NAME? -243 3
2 1 UNK -591 2
3 0 Exec Code -93 3
4 0 UNK -77 3
My code right now looks like this:
data = pd.read_csv('tag_SA.txt', sep='|')
for score in data['CVSS Score']:
if 0.0 < score < 6.0:
data["Class Number"] = 1
elif(6 <= score < 7.5):
data["Class Number"] = 2
else:
data["Class Number"] = 3
and the output I am getting is this:
CVE-ID CVE Creation Date Patch Date CVSS Score \
0 CVE-2012-6702 6/3/2016 6/7/2016 5.9
1 CVE-2015-8951 8/15/2016 12/16/2015 7.8
2 CVE-2015-9016 3/28/2017 8/15/2015 7.0
3 CVE-2016-10230 3/1/2017 11/28/2016 9.8
4 CVE-2016-10231 3/1/2017 12/14/2016 7.8
Bug Description # of lines added \
0 Expat, when used in a parser that has not call... 41
1 Multiple use-after-free vulnerabilities in sou... 10
2 In blk_mq_tag_to_rq in blk-mq.c in the upstrea... 3
3 A remote code execution vulnerability in the Q... 7
4 An elevation of privilege vulnerability in the... 8
number of lines removed Vuln Type Diff of dates Class Number
0 7 UNK 4 3
1 3 #NAME? -243 3
2 1 UNK -591 3
3 0 Exec Code -93 3
4 0 UNK -77 3
So it is just going to the else statement and considering the other statements to be false. Is there something I am missing with float comparisons in python? Any help would be appreciated
Upvotes: 0
Views: 130
Reputation: 14987
Your problem is not about comparing floats, it is that you are overwriting the whole column of the dataframe when you assign.
You need to set only those rows, where the condition is fulfilled, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html, you should probably also go over https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html.
Using what is documented there:
data = pd.read_csv('tag_SA.txt', sep='|')
data['Class Number'] = 3
mask = (0.0 < data['CVSS Score']) & (data['CVSS Score'] <= 6.0)
data.loc[mask, 'Class Number'] = 1
mask = (6.0 < data['CVSS Score']) & (data['CVSS Score'] <= 7.5)
data.loc[mask, 'Class Number'] = 2
You can also use pandas.cut
like this:
max_val = data['CVSS Score'].max()
# codes start at 0, add 1 if needed
data['Class Number'] = pd.cut(data['CVSS Score'], [0, 6, 7.5, max_val]).codes + 1
Upvotes: 1
Reputation: 4477
Try using the apply method of a Series and assign the result to a new column named Class Number
.
In your case, it'll look something like:
data = pd.DataFrame({'CVSS Score': [1, 2, .5, 6.2, 6.3, 9, 19, 6.1, 2, .5]})
def classify_cvss_score(score):
if 0 < score < 6:
return 1
elif 6 <= score <= 7.5:
return 2
return 3
data['Class Number'] = data['CVSS Score'].apply(classify_cvss_score)
Upvotes: 0