saifuddin778
saifuddin778

Reputation: 7288

Insert Missing Month Values as 0 in Python List

I have a list:

t = [['01-2012', 3], 
     ['02-2012', 2], 
     ['03-2012', 9], 
     ['04-2012', 1], 
     ['05-2012', 6], 
     ['06-2012', 40], 
     ['07-2012', 3], 
     ['08-2012', 282], 
     ['09-2012', 3], 
     ['12-2012', 6],
     ['02-2013', 16],
     ['04-2013', 9]]

I need to add all the missing [mm-yyyy, value] elements on their respective places within this list as [mm-yyyy, 0], so as to smooth the time series across all the missing months. Any guesses?

Right now what I am doing is calculating all the 0 values as follows:

total_items = [0]*12

for d in t:
    month = int(d[0].split('-')[0], 10)
    total_items[month-1] = d[1]

so this gives me all the values including the ones for missing months as 0, but I am not sure how to generate the corresponding mm-yyyy element for them. Any help would be appreciated. Thanks.

Upvotes: 1

Views: 1262

Answers (2)

Phillip Cloud
Phillip Cloud

Reputation: 25692

You can also use the pandas library for such things

t = [['01-2012', 3], 
     ['02-2012', 2], 
     ['03-2012', 9], 
     ['04-2012', 1], 
     ['05-2012', 6], 
     ['06-2012', 40], 
     ['07-2012', 3], 
     ['08-2012', 282], 
     ['09-2012', 3], 
     ['12-2012', 6],
     ['02-2013', 16],
     ['04-2013', 9]]
t = array(t, dtype=object)
s = Series(t[:, 1], index=pd.to_datetime(t[:, 0])).convert_objects().sort_index()
s.resample('MS').fillna(0)

to get

2012-01-01      3
2012-02-01      2
2012-03-01      9
2012-04-01      1
2012-05-01      6
2012-06-01     40
2012-07-01      3
2012-08-01    282
2012-09-01      3
2012-10-01      0
2012-11-01      0
2012-12-01      6
2013-01-01      0
2013-02-01     16
2013-03-01      0
2013-04-01      9
Freq: MS, dtype: float64

Note that I've arbitrarily used the first of the month here.

Upvotes: 1

abarnert
abarnert

Reputation: 365955

I think a simpler solution is to just iterate through the months and years in the range, and add the ones that are missing.


This would be easier if you used a dict instead of a list of two-element lists, so let's do that first.

data = dict(t)
for year in range(2012, 2014):
    for month in range(1, 13):
        mmyyyy = '{:02}-{:04}'.format(month, year)
        data.setdefault(mmyyyy, 0)

Then, if you want to convert it back to your original format, that's easy too:

t = [[k, v] for k, v in data.items()]

And if you need them sorted by date… Well, your month-first format makes that ugly, but it's certainly doable:

t = sorted(t, key=lambda kv: kv[0][3:] + kv[0][:2])

But, as that last line shows, it's often a whole lot easier to deal with dates as date objects rather than strings. So, you might want to consider that as well.


However, maybe rather than inserting the 0s, you'd be better off just keeping 0 as a default and processing them by looping over years and months instead of by looping over the collection?

Upvotes: 1

Related Questions