Reputation: 1717
I have a nested list that contains different variables in it. I am trying to check the difference value between two consecutive items, where if a condition match, group these items together.
i.e.
Item 1 happened on 1-6-2012 1 pm
Item 2 happened on 1-6-2012 4 pm
Item 3 happened on 1-6-2012 6 pm
Item 4 happened on 3-6-2012 5 pm
Item 5 happened on 5-6-2012 5 pm
I want to group the items that have gaps less than 24 Hours. In this case, Items 1, 2 and 3 belong to a group, Item 4 belong to a group and Item 5 belong to another group. I tried the following code:
Time = []
All_Traps = []
Traps = []
Dic_Traps = defaultdict(list)
Traps_CSV = csv.reader(open("D:/Users/d774911/Desktop/Telstra Internship/Working files/Traps_Generic_Features.csv"))
for rows in Traps_CSV:
All_Traps.append(rows)
All_Traps.sort(key=lambda x: x[9])
for length in xrange(len(All_Traps)):
if length == (len(All_Traps) - 1):
break
Node_Name_1 = All_Traps[length][2]
Node_Name_2 = All_Traps[length + 1][2]
Event_Type_1 = All_Traps[length][5]
Event_Type_2 = All_Traps[length + 1][5]
Time_1 = All_Traps[length][9]
Time_2 = All_Traps[length + 1][9]
Difference = datetime.strptime(Time_2[0:19], '%Y-%m-%dT%H:%M:%S') - datetime.strptime(Time_1[0:19], '%Y-%m-%dT%H:%M:%S')
if Node_Name_1 == Node_Name_2 and \
Event_Type_1 == Event_Type_2 and \
float(Difference.seconds) / (60*60) < 24:
Dic_Traps[length].append(All_Traps[Length])
But I am missing some items. Ideas?
Upvotes: 0
Views: 111
Reputation: 1400
First of all, change those horrible cased variable names. Python has its own convention of naming variables, classes, methods and so on. It's called snake case.
Now, on to what you need to do:
import datetime as dt
import pprint
ts_dict = {}
with open('timex.dat', 'r+') as f:
for line in f.read().splitlines():
if line:
item = line.split('happened')[0].strip().split(' ')[1]
timestamp_string = line.split('on')[-1].split('pm')[0]
datetime_stamp = dt.datetime.strptime(timestamp_string.strip(), "%d-%m-%Y %H")
ts_dict[item] = datetime_stamp
This is a hackish way of giving you this:
item_timestamp_dict= {
'1': datetime.datetime(2012, 6, 1, 1, 0),
'2': datetime.datetime(2012, 6, 1, 4, 0),
'3': datetime.datetime(2012, 6, 1, 6, 0),
'4': datetime.datetime(2012, 6, 3, 5, 0),
'5': datetime.datetime(2012, 6, 5, 5, 0)}
A dictionary of item # as key, and their datetime timestamp as value.
You can use the datetime timestamp values' item_timestamp_dict['1'].hour
values to do your calculation.
EDIT: It can be optimized a lot.
Upvotes: 0
Reputation: 528
For sorted list you may use groupby. Here is a simplified example (you should convert your date strings to datetime objects), it should give the main idea:
from itertools import groupby
import datetime
SRC_DATA = [
(1, datetime.datetime(2015, 06, 20, 1)),
(2, datetime.datetime(2015, 06, 20, 4)),
(3, datetime.datetime(2015, 06, 20, 5)),
(4, datetime.datetime(2015, 06, 21, 1)),
(5, datetime.datetime(2015, 06, 22, 1)),
(6, datetime.datetime(2015, 06, 22, 4)),
]
for group_date, group in groupby(SRC_DATA, key=lambda X: X[1].date()):
print "Group {}: {}".format(group_date, list(group))
Output:
$ python python_groupby.py
Group 2015-06-20: [(1, datetime.datetime(2015, 6, 20, 1, 0)), (2, datetime.datetime(2015, 6, 20, 4, 0)), (3, datetime.datetime(2015, 6, 20, 5, 0))]
Group 2015-06-21: [(4, datetime.datetime(2015, 6, 21, 1, 0))]
Group 2015-06-22: [(5, datetime.datetime(2015, 6, 22, 1, 0)), (6, datetime.datetime(2015, 6, 22, 4, 0))]
Upvotes: 1