Reputation: 125
Here is my code:
pizzarequests = pd.Series(open('pizza_requests.txt').read().splitlines())
line = "unix_timestamp_of_request_utc"
lines = pizzarequests[pizzarequests.str.contains(line)].str.split(",").str[1]
print(lines)
dts = pd.to_datetime(lines, unit='s')
hours = dts.dt.hour
print(hours)
pizzarequests = pd.Series(open('pizza_requests.txt').read().splitlines())
line = "requester_received_pizza"
lines = pizzarequests[pizzarequests.str.contains(line)].str.split(",").str[1]
data = pd.DataFrame({'houroftheday' : hours.values, 'successpizza' : lines})
print(data)
****Which gives me:****
houroftheday successpizza
23 18 true
67 2 true
105 14 true
166 23 true
258 20 true
297 1 true
340 2 true
385 22 true
...
304646 21 false
304686 12 false
304746 1 false
304783 3 false
304840 20 false
304907 17 false
304948 1 false
305023 4 false
How can I sum the hours that only correspond to the trues?
Upvotes: 1
Views: 165
Reputation: 863731
First filter all rows by True
s in column successpizza
and then sum
column houroftheday
:
sum_hour = data.loc[data['successpizza'] == 'true', 'houroftheday'].sum()
print (sum_hour)
102
If want size
is necessary only count True
s, if use sum
, True
s are processes like 1
:
len_hour = (data['successpizza'] == 'true').sum()
print (len_hour)
8
Or if need length of each houroftheday
:
mask = (data['successpizza'] == 'true').astype(int)
out = mask.groupby(data['houroftheday']).sum()
print (out)
houroftheday
1 1
2 2
3 0
12 0
14 1
18 1
20 1
21 0
22 1
23 1
Name: successpizza, dtype: int32
Solution for remove traling whitespaces is str.strip
:
line = "requester_received_pizza"
lines = pizzarequests[pizzarequests.str.contains(line)].str.split(",").str[1].str.strip()
Upvotes: 1
Reputation: 36781
I think you want a count of the occurrences of each hour where successpizza
is true
. If so you will want to slice the data frame using successpizza
, then groupby
the houroftheday
column and aggregate using a count.
It also looks like you are reading in the true/false values from a file, so they are strings. You will need to convert them first.
data.successpizza = data.successpizza.apply(lambda x: x=='true')
data[data.successpizza].groupby('houroftheday').count()
Upvotes: 0