jenryb
jenryb

Reputation: 2117

Making a list based off of list of months in Python

I am using Python to make lists. Should be easy! I don't know why I'm struggling so much with this.

I have some data that I am counting up by date. There is a date column like this:

Created on
5/1/2015
5/1/2015
6/1/2015
6/1/2015
7/1/2015
8/1/2015
8/1/2015
8/1/2015

In this case, there would be 2 Units created in May, 2 Units in June, 1 Unit in July, and 3 Units in August.

I want to reflect that in a list that starts in April ([April counts, May counts, June counts, etc...]):

NumberofUnits = [0, 2, 3, 1, 3, 0, 0, 0, 0, 0, 0, 0]  

I have a nice list of months

monthnumbers

Out[69]: [8, 5, 6, 7]

I also have a list with the unitcounts = [2, 3, 1, 3] I got this using value_counts.

So it's a matter of making a list of zeroes and replacing parts with the unitcount list, right?

For some reason all of my tries are either not making a list or making a list with one zero in it.

NumberofUnits = [0]*12

for i in range(0,len(monthnumbers)):
    if  **monthnumbers[i] == (i+4):** **This part is wrong**       
        NumberofUnits.append(unitcounts[i])
        s = slice(0,i+1)

I also tried

NumberofUnits = []
for i in range(0, 12):
    if len(NumberofUnits) > i:
        unitcounts[i:]+unitcounts[:i]
        NumberofUnits.append(unitcounts[i])
        s = slice(0,i+1)
    else:
        unitcounts.append(0)

But this doesn't account for the fact that in this round my data starts with May, so I need a zero in the first slot.

Upvotes: 3

Views: 136

Answers (4)

Martin Evans
Martin Evans

Reputation: 46759

The following is a more "old school" approach. It assumes your dates are in the first column of your CSV file, i.e. cols[0]. It validates the input dates, it will raise a ValueError exception if a date is not valid or if it is older than the last one. It will also cope if your input skips one or more months.

import csv
from datetime import datetime

with open("input.csv", "r") as f_input:
    csv_input = csv.reader(f_input)
    header = next(csv_input)
    last_date = datetime(year=2015, month=4, day=1)
    cur_total = 0
    units_by_month = []

    for cols in csv_input:
        cur_date = datetime.strptime(cols[0], "%m/%d/%Y")

        if cur_date.month == last_date.month:
            cur_total += 1
        elif cur_date < last_date:
            raise ValueError, "Date is older"
        else:
            extra_months = ((cur_date.month + 12 - last_date.month) if cur_date.year - last_date.year else (cur_date.month - last_date.month)) - 1
            units_by_month.extend([cur_total] + ([0] * extra_months))
            last_date = cur_date
            cur_total = 1

    units_by_month.extend([cur_total] + [0] * ((8-len(units_by_month)) if len(units_by_month) < 9 else 0))
    print units_by_month

So for your input it will give the following output:

[0, 2, 2, 1, 3, 0, 0, 0, 0, 0]

If one extra entry was added 3/1/2016, the following would be displayed:

[0, 2, 2, 1, 3, 0, 0, 0, 0, 0, 0, 1]

Upvotes: 0

Andrey
Andrey

Reputation: 60065

Why not just:

counter = [0]*12
for m in monthnumbers:
   counter[(m - 4) % 12] += 1

print counter

Upvotes: 1

zero323
zero323

Reputation: 330193

You can count entries using collections.counter

from collections import Counter

lines = ['5/1/2015', '5/1/2015', ..., '8/1/2015']
month_numbers = [int(line.split("/")[0]) for line in lines]

cnt = Counter(month_numbers)

If you already have counts you can replace above with

from collections import defaultdict

cnt = defaultdict(int, zip(monthnumbers, unitcounts))

and simply map to entries with (month_number - offset) mod 12:

[x[1] for x in sorted([((i - offset) % 12, cnt[i]) for i in range(1, 13)])]

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

If the data is coming from a file or any iterable you can use an OrderedDict, creating the keys in order starting from 4/april, then increment the count for each month you encounter the finally print the list of values at the end which will be in the required order:

from collections import OrderedDict

od = OrderedDict((i % 12 or 12, 0) for i in range(4, 16))
# -> OrderedDict([(4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (10, 0), (11, 0), (12, 0), (1, 0), (2, 0), (3, 0)])

with open("in.txt") as f:
    for line in f:
        mn = int(line.split("/",1)[0])
        od.setdefault(mn, 0)
        od[mn] += 1

print(list(od.values()))
[0, 2, 2, 1, 3, 0, 0, 0, 0, 0, 0, 0]

Unless you do the logic like above, associating the data when you actually parse it then it is going to be a lot harder figure out what count is for which month. Creating the association straight away is a much simpler approach.

If you have a list, tuple etc.. of values the logic is exactly the same:

 for dte in list_of_dates:
        mn = int(dte.split("/",1)[0])
        od.setdefault(mn, 0)
        od[mn] += 1

Upvotes: 1

Related Questions