Reputation: 7288
I have a list:
t = [['01-2012', 3],
['02-2012', 2],
['03-2012', 9],
['04-2012', 1],
['05-2012', 6],
['06-2012', 40],
['07-2012', 3],
['08-2012', 282],
['09-2012', 3],
['12-2012', 6],
['02-2013', 16],
['04-2013', 9]]
I need to add all the missing [mm-yyyy, value] elements on their respective places within this list as [mm-yyyy, 0], so as to smooth the time series across all the missing months. Any guesses?
Right now what I am doing is calculating all the 0 values as follows:
total_items = [0]*12
for d in t:
month = int(d[0].split('-')[0], 10)
total_items[month-1] = d[1]
so this gives me all the values including the ones for missing months as 0, but I am not sure how to generate the corresponding mm-yyyy element for them. Any help would be appreciated. Thanks.
Upvotes: 1
Views: 1262
Reputation: 25692
You can also use the pandas
library for such things
t = [['01-2012', 3],
['02-2012', 2],
['03-2012', 9],
['04-2012', 1],
['05-2012', 6],
['06-2012', 40],
['07-2012', 3],
['08-2012', 282],
['09-2012', 3],
['12-2012', 6],
['02-2013', 16],
['04-2013', 9]]
t = array(t, dtype=object)
s = Series(t[:, 1], index=pd.to_datetime(t[:, 0])).convert_objects().sort_index()
s.resample('MS').fillna(0)
to get
2012-01-01 3
2012-02-01 2
2012-03-01 9
2012-04-01 1
2012-05-01 6
2012-06-01 40
2012-07-01 3
2012-08-01 282
2012-09-01 3
2012-10-01 0
2012-11-01 0
2012-12-01 6
2013-01-01 0
2013-02-01 16
2013-03-01 0
2013-04-01 9
Freq: MS, dtype: float64
Note that I've arbitrarily used the first of the month here.
Upvotes: 1
Reputation: 365955
I think a simpler solution is to just iterate through the months and years in the range, and add the ones that are missing.
This would be easier if you used a dict
instead of a list
of two-element list
s, so let's do that first.
data = dict(t)
for year in range(2012, 2014):
for month in range(1, 13):
mmyyyy = '{:02}-{:04}'.format(month, year)
data.setdefault(mmyyyy, 0)
Then, if you want to convert it back to your original format, that's easy too:
t = [[k, v] for k, v in data.items()]
And if you need them sorted by date… Well, your month-first format makes that ugly, but it's certainly doable:
t = sorted(t, key=lambda kv: kv[0][3:] + kv[0][:2])
But, as that last line shows, it's often a whole lot easier to deal with dates as date objects rather than strings. So, you might want to consider that as well.
However, maybe rather than inserting the 0s, you'd be better off just keeping 0 as a default and processing them by looping over years and months instead of by looping over the collection?
Upvotes: 1