Reputation: 11
I am a noob in Python and generally in programming, which is why, please do not judge me, if I mistake in description of my request. So I need to solve a task.
I have two DateFrames. One of them has 20 000 rows and three columns: id_clients, start_time_visit, end_time_visit. Name of DateFrame = visits This is visits of users on websites.
Second DateFreme has 800 rows, and has columns: id_lead, time_create_lead, model. This data from CRM system.
So I need to find out which id_clients from websites create a lead in CRM.
I think that I can apply from cycle for example: start_time < time_of_create_lead < end_time
I have wrote this code:
dict_time = crm.set_index('time_create_lead').to_dict() # create a dict
def time_search (row):
""" This function has to return a row with create_time_of_lead"""
time_start = row['start_time']
time_end = row['end_time']
if time_start <= dict_time <= time_end:
return value
else:
return 0
goals['leed'] = goals.apply(time_search, axis=1)
But I have this error: TypeError: '<=' not supported between instances of 'Timestamp' and 'dict'
I research internet, but did not find a way to solve this. I will be thankful for any help.
Upvotes: 1
Views: 55
Reputation: 345
If I understood your question and comment correctly, this should work-
df3.one = df.one
df3.two = df.two
df3.df2_date = df2.values()
Upvotes: 0
Reputation: 154
If I did understand what you want to do; you want to return the clients_id of visitors who created a lead in CRM.
To do this, you will need to to check every time_of_create_lead if it is between the start_time and end_time. If true, you will return the clients_id corresponding to these dates.
Here's a Code that can do the job (It will probably be time consuming given your large data):
lead_creators = pd.DataFrame(columns=['id_clients', 'id_lead', 'start_time_visit',
'time_create_lead', 'end_time_visit'])
visits['start_time_visit'] = pd.to_datetime(visits['start_time_visit'])
visits['end_time_visit'] = pd.to_datetime(visits['end_time_visit'])
crm['time_create_lead'] = pd.to_datetime(crm['time_create_lead'])
for j in visits.index:
for t in crm.index:
if visits.iloc[j,1]<=crm.iloc[t,1]<=visits.iloc[j,2]:
lead_creators = lead_creators.append({'id_clients' : visits.iloc[j,0],
'id_lead': crm.iloc[t,0], 'start_time_visit': visits.iloc[j,1],
'time_create_lead':crm.iloc[t,1],'end_time_visit':visits.iloc[j,2]},
ignore_index=True)
lead_creators
I already tried on a small dataset and it worked well.
Upvotes: 1
Reputation: 638
df = pd.DataFrame(columns=["one", "two"])
df.one = ["2019-01-24","2019-01-27"]
df.one = pd.to_datetime(df.one)
df.two = ["2019-01-28", "2020-01-29"]
df.two = pd.to_datetime(df.two)
print(df)
difference = (df.two - df.one)
Let me know if this approach works for you.
Upvotes: 0