Yungpythonnoob
Yungpythonnoob

Reputation: 125

How to count trues and falses in a two column data frame?

Here is my code:

pizzarequests = pd.Series(open('pizza_requests.txt').read().splitlines())
line = "unix_timestamp_of_request_utc"
lines = pizzarequests[pizzarequests.str.contains(line)].str.split(",").str[1]
print(lines)
dts = pd.to_datetime(lines, unit='s')
hours = dts.dt.hour
print(hours)


pizzarequests = pd.Series(open('pizza_requests.txt').read().splitlines())
line = "requester_received_pizza"
lines = pizzarequests[pizzarequests.str.contains(line)].str.split(",").str[1]

data = pd.DataFrame({'houroftheday' : hours.values, 'successpizza' : lines})
print(data)

****Which gives me:****

        houroftheday successpizza
23                18        true 
67                 2        true 
105               14        true 
166               23        true 
258               20        true 
297                1        true 
340                2        true 
385               22        true 
...
304646            21       false 
304686            12       false 
304746             1       false 
304783             3       false 
304840            20       false 
304907            17       false 
304948             1       false 
305023             4       false 

How can I sum the hours that only correspond to the trues?

Upvotes: 1

Views: 165

Answers (2)

jezrael
jezrael

Reputation: 863731

First filter all rows by Trues in column successpizza and then sum column houroftheday:

sum_hour = data.loc[data['successpizza'] == 'true', 'houroftheday'].sum()
print (sum_hour)
102

If want size is necessary only count Trues, if use sum, Trues are processes like 1:

len_hour = (data['successpizza'] == 'true').sum()
print (len_hour)
8

Or if need length of each houroftheday:

mask = (data['successpizza']  == 'true').astype(int)
out = mask.groupby(data['houroftheday']).sum()
print (out)
houroftheday
1     1
2     2
3     0
12    0
14    1
18    1
20    1
21    0
22    1
23    1
Name: successpizza, dtype: int32

Solution for remove traling whitespaces is str.strip:

line = "requester_received_pizza"
lines = pizzarequests[pizzarequests.str.contains(line)].str.split(",").str[1].str.strip()

Upvotes: 1

James
James

Reputation: 36781

I think you want a count of the occurrences of each hour where successpizza is true. If so you will want to slice the data frame using successpizza, then groupby the houroftheday column and aggregate using a count.

It also looks like you are reading in the true/false values from a file, so they are strings. You will need to convert them first.

data.successpizza = data.successpizza.apply(lambda x: x=='true')
data[data.successpizza].groupby('houroftheday').count()

Upvotes: 0

Related Questions