Reputation: 265
I have a data set with snowfall records per day for one year. Date variable is in YYYYMMDD form.
Date Snow
20010101 0
20010102 10
20010103 5
20010104 3
20010105 0
...
20011231 0
The actual data is here
https://github.com/emily737373/emily737373/blob/master/COX_SNOW-1.csv
I want to calculate the number of days it snowed each month. I know how to do this with pandas, but for a school project, I need to do it only using numpy. I can not import datetime either, it must be done only using numpy.
The output should be in this form
Month # days snowed
January 13
February 19
March 20
...
December 15
My question is how do I only count the number of days it snowed (basically when snow variable is not 0) without having to do it separately for each month?
Upvotes: 0
Views: 146
Reputation: 1501
I hope you can use some built-in packages, such as datetime
, cause it's useful when working with datetime objects.
import numpy as np
import datetime as dt
df = np.genfromtxt('test_files/COX_SNOW-1.csv', delimiter=',', skip_header=1, dtype=str)
date = np.array([dt.datetime.strptime(d, "%Y%m%d").month for d in df[:, 0]])
snow = df[:, 1].copy().astype(np.int32)
has_snowed = snow > 0
for month in range(1, 13):
month_str = dt.datetime(year=1, month=month, day=1).strftime('%B')
how_much_snow = len(snow[has_snowed & (date == month)])
print(month_str, ':', how_much_snow)
I loaded the data as str
so we guarantee we can parse the Date
column as dates later on. That's why we also need to explicitly convert the snow
column to int32
, otherwise the >
comparison won't work.
The output is as follows:
January : 13
February : 19
March : 20
April : 13
May : 8
June : 9
July : 2
August : 7
September : 9
October : 19
November : 16
December : 15
Let me know if this worked for you or if you have any further questions.
Upvotes: 2