Reputation: 43
Can any one help me how to iterate through dictionary with dates, I have the data set like this
data=[{u'a': u'D', u'b': 100.0, u'c': 201L, u'd': datetime.datetime(2007, 12, 29, 0, 0), u'e': datetime.datetime(2008, 1, 1, 6, 27, 41)},
{u'a': u'W', u'b': 100.0, u'c': 201L, u'd': datetime.datetime(2007, 12, 29, 0, 0), u'e': datetime.datetime(2008, 2, 4, 6, 27, 41)},
{u'a': u'W', u'b': 100.0, u'c': 202L, u'd': datetime.datetime(2007, 12, 30, 0, 0), u'e': datetime.datetime(2008, 1, 1, 4, 20, 44)},
{u'a': u'D', u'b': 100.0, u'c': 202L, u'd': datetime.datetime(2007, 12, 30, 0, 0), u'e': datetime.datetime(2008, 3, 11, 6, 27, 41)},
{u'a': u'D', u'b': 100.0, u'c': 202L, u'd': datetime.datetime(2007, 12, 30, 0, 0), u'e': datetime.datetime(2008, 5, 8, 11, 2, 41)},
{u'a': u'D', u'b': 100.0, u'c': 203L, u'd': datetime.datetime(2008, 1, 2, 0, 0), u'e': datetime.datetime(2008, 6, 1, 6, 27, 41)},
{u'a': u'W', u'b': 100.0, u'c': 204L, u'd': datetime.datetime(2008, 2, 9, 0, 0), u'e': datetime.datetime(2008, 4, 21, 12, 30, 51)},
{u'a': u'D', u'b': 100.0, u'c': 204L, u'd': datetime.datetime(2008, 2, 9, 0, 0), u'e': datetime.datetime(2008, 8, 15, 15, 45, 10)}]
How can i bring it into the dictionary of below format
res={u'201L':(1,0,1),(2,1,0),(3,0,0),(4,0,0).. so on till (12,0,0),
u'202L':(1,1,0),(2,0,0),(3,0,1),(4,0,0),(5,0,1)...(12,0,0),
u'203L':(1,0,0),(2,0,0),(3,0,0),(4,0,0),(5,1,0)...(12,0,0),
u'204L':(1,0,0),(2,0,0),(3,0,0),(4,1,0),(5,0,0),(6,0,0,(7,0,0),(8,0,1)...(12,0,0)}
where 1, 2, 3 is the first, second month and so on from their card issue date i.e
for 201L
issue date is datetime.datetime(2007, 12, 29, 0, 0)
, 202L
it is datetime.datetime(2007, 12, 30, 0, 0)
first month means from 2007-12-29
to 2008-1-29
(1,0,1)---where 1 is the first month
0 is no of times W
1 is no of times D
I tried something like this
data_dict=defaultdict(Counter)
date_dic={}
for x in data:
a,b,c,d=x['a'],x['c'],x['d'],x['e']
data_dict[b][a] += 1
for key , value in data_dict.items():
date_dic[key] = tuple(map(datetime.date.isoformat, (c,d)))
for value in range(1,30):
if value not x: continue
I have been stuck after if loop what can i add to get in the above format.I end up getting something like this as my output,
defaultdict(<class 'collections.Counter'>, {201L: Counter({u'D': 1, u'W': 1}), 202L: Counter({u'D': 2, u'W': 1}), 203L: Counter({u'D': 1}), 204L: Counter({u'D': 1, u'W': 1})})
Upvotes: 0
Views: 961
Reputation: 1121486
I'd create a list of dates, then find the 'bucket' to put each item into from that list.
You can create new dates relative from a starting point using datetime.timedelta()
objects:
startdate = data[0]['d']
buckets = [startdate + datetime.timedelta(days=30) * i for i in xrange(12)]
Now you have 12 dates to compare everything else against, so you know what bucket to put each subsequent value in:
>>> buckets
[datetime.datetime(2007, 12, 29, 0, 0), datetime.datetime(2008, 1, 28, 0, 0), datetime.datetime(2008, 2, 27, 0, 0), datetime.datetime(2008, 3, 28, 0, 0), datetime.datetime(2008, 4, 27, 0, 0), datetime.datetime(2008, 5, 27, 0, 0), datetime.datetime(2008, 6, 26, 0, 0), datetime.datetime(2008, 7, 26, 0, 0), datetime.datetime(2008, 8, 25, 0, 0), datetime.datetime(2008, 9, 24, 0, 0), datetime.datetime(2008, 10, 24, 0, 0), datetime.datetime(2008, 11, 23, 0, 0)]
We can then use the bisect
module to find the matching bucket:
from bisect import bisect
bisect(buckets, somedate) - 1 # Returns a value from 0 - 11
We create such buckets per user so we need to keep track of the buckets in a separate mapping. We'll actually create buckets on the fly as needed to fit the current transaction date.
Next, we use a collections.defaultdict
instance to track per-key tallies (key c
in your input):
from collections import defaultdict
res = defaultdict(list)
empty_counts = {'D': 0, 'W': 0}
This creates a list for your buckets to hold, and a empty counts dictionary for the deposits and withdrawals. I used a dictionary here because that is much easier to work with than having to manipulate (immutable) tuples later on. I also did not include the month number (1 - 12); no point, you already have an index for each bucket (0 - 11), and you can have a variable number of buckets.
We need to create buckets and counters as needed to fit the current date in; instead of scanning throuh the data to find the max transaction date per user we just expand our buckets and counts list as needed:
def expand_buckets(buckets, bucket_counts, start, transaction):
# This function modifies the buckets and bucket_counts lists in-place
if not buckets:
# initialize the lists
buckets.append(start)
bucket_counts.append(dict(empty_counts))
# keep adding 30-day spans until we can fit the transaction date
while buckets[-1] + datetime.timedelta(days=30) < transaction:
buckets.append(buckets[-1] + datetime.timedelta(days=30))
bucket_counts.append(dict(empty_counts))
Now we can start counting:
per_user_buckets = defaultdict(list)
for entry in data:
user = entry['c']
type = entry['a']
transaction_date = entry['e']
buckets = per_user_buckets[user]
bucket_counts = res[user]
expand_buckets(buckets, bucket_counts, entry['d'], transaction_date)
# count transaction date entries per bucket
bucket = bisect(buckets, transaction_date) - 1
bucket_counts[bucket][type] += 1
The bisect
call makes picking the right bucket easy and fast.
The result for your example input is:
>>> pprint(dict(res))
{201L: [{'D': 1, 'W': 0},
{'D': 0, 'W': 1}],
202L: [{'D': 0, 'W': 1},
{'D': 0, 'W': 0},
{'D': 1, 'W': 0},
{'D': 0, 'W': 0},
{'D': 1, 'W': 0}],
203L: [{'D': 0, 'W': 0},
{'D': 0, 'W': 0},
{'D': 0, 'W': 0},
{'D': 0, 'W': 0},
{'D': 0, 'W': 0},
{'D': 1, 'W': 0}],
204L: [{'D': 0, 'W': 0},
{'D': 0, 'W': 0},
{'D': 0, 'W': 1},
{'D': 0, 'W': 0},
{'D': 0, 'W': 0},
{'D': 0, 'W': 0},
{'D': 1, 'W': 0}]}
Upvotes: 2