Reputation: 37
I have simplified my CSV as below:
Job,Task,Operator,Comment,Time,Required
Forklift,Crating,John,Daily Task,2020/11/23 00:14:59.280,Y
Truck,Crating,Steve,Daily Task,2020/11/23 01:14:59.280,N
Truck,Storage,Brain,Daily Task,2020/11/23 02:14:59.280,Y
Forklift,Crating,John,Daily Task,2020/11/23 03:14:59.280,Y
Shop Floor,Stacking,Steve,Weekly,2020/11/23 04:14:59.280,N
Truck,Storage,Brain,Daily Task,2020/11/23 06:14:59.280,Y
Forklift,Crating,John,Daily Task,2020/11/23 08:14:59.280,Y
Forklift,Crating,John,Daily Task,2020/11/23 10:14:59.280,Y
Truck,Storage,Brain,Daily Task,2020/11/23 12:14:59.280,Y
I'm looking to get find the required tasks and the previous start time for this tasks. So in this case I would like the output to be something like:
Job Task Operator Comment Time Previous Time
Forklift Crating John Daily Task 11/23/2020 3:14 11/23/2020 0:14
Truck Storage Brain Daily Task 11/23/2020 6:14 11/23/2020 2:14
So far I have my code finding the required tasks and assigning the cells to pandas DFs. But I can't figure out how to loop back through the csv to find the previous time for matching rows.
import csv
import pandas
import datetime
from datetime import datetime, timedelta
Job=[]
Task=[]
Oper=[]
Time=[]
Comment=[]
with open('Job.csv','r') as source:
reader = csv.reader((line.replace('\0','') for line in source), delimiter=",", quotechar='"')
next(reader, None)
count = 0
CurrentTime = datetime.now()
Last24HrsTime = CurrentTime - timedelta(hours=24)
print(Last24HrsTime)
for row in reader:
count += 1
Required_Row = row[5]
JobStartTime = row[4]
if JobStartTime in (None, ""):
JobStartTimeFormat = "2019/09/08 01:01:59.280" #In case there is an error and the date is blank.
else:
JobStartTimeFormat = datetime.strptime(JobStartTime, '%Y/%m/%d %H:%M:%S.%f')
JobStartTimeFormatPrint = datetime.strftime(JobStartTimeFormat, '%Y/%m/%d %H:%M')
if Required_Row == "Y" and JobStartTimeFormat >= Last24HrsTime: #Is the job required and Job start is in the last 24hrs?
JobID = row[0]
TaskID = row[1]
OperID = row[2]
SearchRow = count + 1
Job.append({'Job': row[1]})
Task.append({'Task': row[2]})
Oper.append({'Oper': row[2]})
Time.append({'Tool': 'JobStartTimeFormatPrint'})
Comment.append({'Comment': row[3]})
##Need to loop here to find preivous values
Current output of this is:
Forklift Crating John 2020/11/23 00:14 Daily Task
Truck Storage Brain 2020/11/23 02:14 Daily Task
Forklift Crating John 2020/11/23 03:14 Daily Task
Truck Storage Brain 2020/11/23 06:14 Daily Task
Forklift Crating John 2020/11/23 08:14 Daily Task
Forklift Crating John 2020/11/23 10:14 Daily Task
Truck Storage Brain 2020/11/23 12:14 Daily Task
Any ideas how to loop through and just find the pervious occurrence?
Upvotes: 1
Views: 91
Reputation: 2060
I wouldn't use loops in this case, it gets a little messy. What you want is per group (e.g. task) the previous time. The previous time is defined as: given that the rows are sorted by time, the previous row. Getting the previous row can be done with .shift(1)
.
Full example:
import csv
import pandas
# Convert data to a list of dictionaries
with open("my_csv_file.csv", "r") as f:
reader = csv.DictReader(f)
data = [dict(line) for line in reader]
df = pandas.DataFrame(data)
# Parse time field
df["Time"] = pandas.to_datetime(df["Time"])
# Select only required tasks
df = df.loc[df["Required"] == "Y"]
def get_previous_time_per_group(group: pandas.DataFrame) -> pandas.DataFrame:
""" Method to get the previous time as a column per group.
Args:
group (pandas.DataFrame): The input group (created by a groupby command)
Returns:
pandas.DataFrame: The resulting group with the previous time column
"""
# Sort by the time column
group = group.sort_values("Time")
# Get the previous time by shifting the time column by 1 position
group["Previous Time"] = group["Time"].shift(1)
# Return the group, including the previous time
return group
# Group the dataframe by task (or whatever you want) and per group get the previous time column
df = df.groupby("Task").apply(get_previous_time_per_group)
# Clean up the index created by the groupby
df = df.droplevel(0).sort_index()
Full example:
import io
import csv
import pandas
f = io.StringIO("""Job,Task,Operator,Comment,Time,Required
Forklift,Crating,John,Daily Task,2020/11/23 00:14:59.280,Y
Truck,Crating,Steve,Daily Task,2020/11/23 01:14:59.280,N
Truck,Storage,Brain,Daily Task,2020/11/23 02:14:59.280,Y
Forklift,Crating,John,Daily Task,2020/11/23 03:14:59.280,Y
Shop Floor,Stacking,Steve,Weekly,2020/11/23 04:14:59.280,N
Truck,Storage,Brain,Daily Task,2020/11/23 06:14:59.280,Y
Forklift,Crating,John,Daily Task,2020/11/23 08:14:59.280,Y
Forklift,Crating,John,Daily Task,2020/11/23 10:14:59.280,Y
Truck,Storage,Brain,Daily Task,2020/11/23 12:14:59.280,Y
""")
# Convert data to a list of dictionaries
reader = csv.DictReader(f)
data = [dict(line) for line in reader]
df = pandas.DataFrame(data)
# Parse time field
df["Time"] = pandas.to_datetime(df["Time"])
# Select only required tasks
df = df.loc[df["Required"] == "Y"]
def get_previous_time_per_group(group: pandas.DataFrame) -> pandas.DataFrame:
""" Method to get the previous time as a column per group.
Args:
group (pandas.DataFrame): The input group (created by a groupby command)
Returns:
pandas.DataFrame: The resulting group with the previous time column
"""
# Sort by the time column
group = group.sort_values("Time")
# Get the previous time by shifting the time column by 1 position
group["Previous Time"] = group["Time"].shift(1)
# Return the group, including the previous time
return group
# Group the dataframe by task (or whatever you want) and per group get the previous time column
df = df.groupby("Task").apply(get_previous_time_per_group)
# Clean up the index created by the groupby
df = df.droplevel(0).sort_index()
Output
print(df)
Job Task Operator Comment Time Required \
0 Forklift Crating John Daily Task 2020-11-23 00:14:59.280 Y
2 Truck Storage Brain Daily Task 2020-11-23 02:14:59.280 Y
3 Forklift Crating John Daily Task 2020-11-23 03:14:59.280 Y
5 Truck Storage Brain Daily Task 2020-11-23 06:14:59.280 Y
6 Forklift Crating John Daily Task 2020-11-23 08:14:59.280 Y
7 Forklift Crating John Daily Task 2020-11-23 10:14:59.280 Y
8 Truck Storage Brain Daily Task 2020-11-23 12:14:59.280 Y
Previous Time
0 NaT
2 NaT
3 2020-11-23 00:14:59.280
5 2020-11-23 02:14:59.280
6 2020-11-23 03:14:59.280
7 2020-11-23 08:14:59.280
8 2020-11-23 06:14:59.280
NOTE: If there is no previous time in the data, the value will be NaT
(Not a Time)
Upvotes: 1