Reputation: 135
I understand that python can treat 'True' as '1' (as do many coding languages) and as such taking the sum() of a list should return the number of trues in the list. (as demonstrated in Counting the number of True Booleans in a Python List)
I'm new to Python and have been going through some of the ISLR application exercises in Python (http://www.springer.com/us/book/9781461471370).
Chapter 2, problem 10 (h) has a pretty simple question asking for the number of observations of a variable ('rm') that are greater than 7. I would expect the following code to work:
test = [Boston['rm'] > 7]
sum(test)
However this returns the entire list "test" with 0's and 1's, not its sum. Can anyone explain why? (note Boston is from the Boston data set from the MASS package in R)
If I use a tuple or numpy array instead of a list it works just fine; for example:
test2 = (Boston['rm'] > 7)
sum(test2)
test3 = np.array(Boston['rm'] > 7)
sum(test3)
Also "test" seems to be a proper Boolean list because the following code using it to subset "Boston" also works fine:
test4 = Boston[Boston['rm'] > 7]
len(test4)
While I have clearly found several methods that work, I'm confused why the first did not. Thanks in advance.
Upvotes: 0
Views: 2293
Reputation: 225125
If I use a tuple or numpy array instead of a list it works just fine; for example:
test2 = (Boston['rm'] > 7) sum(test2) test3 = np.array(Boston['rm'] > 7) sum(test3)
(Boston['rm'] > 7)
uses parentheses for grouping; it isn’t a tuple. The tuple equivalent would be (Boston['rm'] > 7,)
(note the comma), and it breaks in the same way as the list does. Using np.array
on an array doesn’t change it – it’s like the difference between list(5)
and [5]
.
As for why it doesn’t work: Boston['rm'] > 7
is an array, so you want to get its sum directly. Wrapping it in another list means you’re taking the sum of a list of arrays and not a list of booleans.
Upvotes: 6