Find needed datetime between two columns of datetime

I am a noob in Python and generally in programming, which is why, please do not judge me, if I mistake in description of my request. So I need to solve a task.

I have two DateFrames. One of them has 20 000 rows and three columns: id_clients, start_time_visit, end_time_visit. Name of DateFrame = visits This is visits of users on websites.

Second DateFreme has 800 rows, and has columns: id_lead, time_create_lead, model. This data from CRM system.

So I need to find out which id_clients from websites create a lead in CRM.

I think that I can apply from cycle for example: start_time < time_of_create_lead < end_time

I have wrote this code:

dict_time = crm.set_index('time_create_lead').to_dict() # create a dict


def time_search (row):
""" This function has to return a row with create_time_of_lead"""

    time_start = row['start_time']
    time_end = row['end_time']
    if time_start <= dict_time <= time_end:
        return value
    else:
        return 0

goals['leed'] = goals.apply(time_search, axis=1)
  

But I have this error: TypeError: '<=' not supported between instances of 'Timestamp' and 'dict'

I research internet, but did not find a way to solve this. I will be thankful for any help.

Upvotes: 1

Views: 55

Answers (3)

SunilG
SunilG

Reputation: 345

If I understood your question and comment correctly, this should work-

df3.one = df.one
df3.two = df.two
df3.df2_date = df2.values()

Upvotes: 0

Yassine Majdoub
Yassine Majdoub

Reputation: 154

If I did understand what you want to do; you want to return the clients_id of visitors who created a lead in CRM.

To do this, you will need to to check every time_of_create_lead if it is between the start_time and end_time. If true, you will return the clients_id corresponding to these dates.

Here's a Code that can do the job (It will probably be time consuming given your large data):

lead_creators = pd.DataFrame(columns=['id_clients', 'id_lead', 'start_time_visit',
                                  'time_create_lead', 'end_time_visit'])

visits['start_time_visit'] = pd.to_datetime(visits['start_time_visit'])
visits['end_time_visit'] = pd.to_datetime(visits['end_time_visit'])
crm['time_create_lead'] = pd.to_datetime(crm['time_create_lead'])

for j in visits.index:
    for t in crm.index:
        if visits.iloc[j,1]<=crm.iloc[t,1]<=visits.iloc[j,2]:
            lead_creators = lead_creators.append({'id_clients' : visits.iloc[j,0],
                    'id_lead': crm.iloc[t,0], 'start_time_visit': visits.iloc[j,1],
                    'time_create_lead':crm.iloc[t,1],'end_time_visit':visits.iloc[j,2]},
                                                 ignore_index=True)

lead_creators

I already tried on a small dataset and it worked well.

Upvotes: 1

Raw Bit Rabbit
Raw Bit Rabbit

Reputation: 638

df = pd.DataFrame(columns=["one", "two"])

df.one = ["2019-01-24","2019-01-27"]
df.one = pd.to_datetime(df.one)

df.two = ["2019-01-28", "2020-01-29"]
df.two = pd.to_datetime(df.two)

print(df)

difference = (df.two - df.one)

Let me know if this approach works for you.

Upvotes: 0

Related Questions