ChrisG
ChrisG

Reputation: 1251

Calculating aged tickets open at a given point in time

Background: We have a ticketing system and each ticket has fields such as open date, close date, category, type, etc. Each ticket is represented by one row in my data with a key to identify the ticket.

An individual record might look like:

Number, Type, Category, Opened, Closed
TICKET100, Database, Software, 2/1/2020 11:30 AM, 4/22/2020 4:40 PM

Goal: My goal is to create a function that takes an input dataframe (pandas), some type of input age parameter, and a list of attributes/dimensions. The function would then return a dataframe of how many tickets were open past a certain age based on that age parameter, by each of the attributes/dimensions, by date.

Example of input:

ticket_age(input_dataframe, age=5, dimensions=['Type','Category'])

Example snippet of desired output:

Date, Type, Category, Count
3/1/2020, Database, Software, 1
3/2/2020, Database, Software, 1
...
4/22/2020, Database, Software, 0

An important note about the output... if the intersection of the date and the dimensions doesn't have any tickets that satisfy the condition, it should create a row with a count of 0.

Where I'm stuck: I have no idea how to accept the dimensions of a list of unknown size and iterate through all of them.

What have I tried? I've been successful in generating the ticket counts that meet the age criteria when I hard code the loops through dimensions.

How I calculate type_list, first_date, and total_days:

    #Create function to find the minimum date
def date_minimum(input_dataframe, date_to_check):
    return input_dataframe[date_to_check].min().date()

#Create function to find the maximum date
def date_maximum(input_dataframe,date_to_check):
    return input_dataframe[date_to_check].max().date()

    #setup min and max dates
    min_date=date_minimum(df_aged_input,'Opened')
    max_date=date_maximum(df_aged_input,'Closed')
    #Get the first relevant date for the dataframe loop
    first_date=min_date+datetime.timedelta(days=aged_window)

    # Generate a list of unique assignment groups
    type_list=df_aged_input['Type'].unique().tolist()

My loop

aged_output_list=[]
for type_iterate in range(len(type_list)):
    #filter by the type
    aged_type=type_list[type_iterate]
    df_aged_input=df_tkt_relevant[df_tkt_relevant['Type']==aged_type].copy()
    for date_iterate in range(totalDays.days):
        #generate the aged date iterator
        aged_date=first_date+datetime.timedelta(days=date_iterate)
        #Count the number of records in the data frame that match the input conditions
        aged_frame=df_aged_input[(~(df_aged_input['Closed'].dt.date<aged_date))&(df_aged_input['Opened'].dt.date<(aged_date-datetime.timedelta(days=aged_window+1)))].copy()
        aged_frame['aged_Date']=aged_date-datetime.timedelta(days=1)
        aged_count=aged_frame.shape[0]
        #Write the date from iterative date and the aged count to a new data source
        aged_output_list.append([aged_date,aged_type or 'Error: Missing Type',aged_count])

What do I do next? Is there another library that does all of this for me?

Upvotes: 4

Views: 569

Answers (1)

It_is_Chris
It_is_Chris

Reputation: 14113

IIUC

import pandas as pd
from pandas.tseries.offsets import DateOffset
from io import StringIO
import numpy as np

# sample data
s = """Number,Type,Category,Opened,Closed
TICKET100,Database,Software,2/1/2020 11:30 AM,2/22/2020 4:40 PM
TICKET101,Database,Software,2/10/2020 11:30 AM,2/23/2020 4:40 PM
TICKET102,Database,Software,2/11/2020 11:30 AM,2/22/2020 4:40 PM
TICKET103,something,else,2/10/2020 11:30 AM,2/23/2020 4:40 PM
TICKET104,something,else,2/12/2020 11:30 AM,2/22/2020 4:40 PM"""
df = pd.read_csv(StringIO(s))

# convert to datetime
df['Opened'] = pd.to_datetime(df['Opened'])
df['Closed'] = pd.to_datetime(df['Closed'])

# create a function
def myFunc(df, age, dimensions):
    # zip the date offset for open and close together
    z = zip(df['Opened'].dt.date + DateOffset(age), df['Closed'].dt.date - DateOffset(age))
    # create a daterange for each record in the dataframe
    df['date_range'] = [pd.date_range(o, c, freq='D') for o,c in z]
    # groupby your dimensions columns
    g = df.groupby(dimensions)
    # list comprehension for each group and create the date range
    idxs = [pd.date_range(df['Opened'].min(), df['Closed'].max(), freq='D',
                          normalize=True) for _,df in g]
    # sum all the true values where the datetime index create above
    # is in the date_range column
    l = [[sum(df[1]['date_range'].apply(lambda x: idx.isin(x))),
          idx, [df[0]]*len(idx)] for idx,df  in zip(idxs, g)]
    # transpose and hstack the list
    arr = np.transpose(np.hstack(l))
    # return pandas dataframe
    return pd.DataFrame(arr, columns=['count', 'date', 'dimensions'])

out = myFunc(df, 5, ['Type', 'Category'])

    count       date             dimensions
0       0 2020-02-01  (Database, Software)
1       0 2020-02-02  (Database, Software)
2       0 2020-02-03  (Database, Software)
3       0 2020-02-04  (Database, Software)
4       0 2020-02-05  (Database, Software)
5       1 2020-02-06  (Database, Software)
6       1 2020-02-07  (Database, Software)
7       1 2020-02-08  (Database, Software)
8       1 2020-02-09  (Database, Software)
9       1 2020-02-10  (Database, Software)
10      1 2020-02-11  (Database, Software)
11      1 2020-02-12  (Database, Software)
12      1 2020-02-13  (Database, Software)
13      1 2020-02-14  (Database, Software)
14      2 2020-02-15  (Database, Software)
15      3 2020-02-16  (Database, Software)
16      3 2020-02-17  (Database, Software)
17      1 2020-02-18  (Database, Software)
18      0 2020-02-19  (Database, Software)
19      0 2020-02-20  (Database, Software)
20      0 2020-02-21  (Database, Software)
21      0 2020-02-22  (Database, Software)
22      0 2020-02-23  (Database, Software)
23      0 2020-02-10     (something, else)
24      0 2020-02-11     (something, else)
25      0 2020-02-12     (something, else)
26      0 2020-02-13     (something, else)
27      0 2020-02-14     (something, else)
28      1 2020-02-15     (something, else)
29      1 2020-02-16     (something, else)
30      2 2020-02-17     (something, else)
31      1 2020-02-18     (something, else)
32      0 2020-02-19     (something, else)
33      0 2020-02-20     (something, else)
34      0 2020-02-21     (something, else)
35      0 2020-02-22     (something, else)
36      0 2020-02-23     (something, else)

Upvotes: 3

Related Questions