Reputation: 317
I have a contact data below.
mobile email contact_code index_clone contact_day
0 0972135314 [email protected] 1 0 9/26/2018
1 0972135314 [email protected] 2 1 9/26/2018
2 0943360092 [email protected] 3 2 9/25/2018
3 0988870766 [email protected] 4 3 9/24/2018
4 0972135314 [email protected] 5 4 9/27/2018
I need to create a new column called "user". User column get data from contact code, if contacts have the same mobile or the same email, they are the same user, and the user value for these contacts is the smallest contact_code.
For example: Row 4: mobile is 0972135314 so it matches row 0 and 1, email is [email protected] so it matches row 2. The contact_code for all 3 matched row is {1,2,3}, so user column of row 4 is 1, the smallest value.
The problem is, I need to loop through each row to find matched rows, get the list of contact_code and assign the smallest value to user column. So Pandas runs slowly. How can I make the process faster ? Here my code.
for idx in data.index:
data.set_value(idx,"user",min(data.loc[(data["index_clone"] <= idx) & ((data["mobile"] == data.get_value(idx,"mobile")) | (data["email"] == data.get_value(idx,"email"))),"contact_code"]))
*Update: Thank w-m and jpp for your answers. But I have another problem. Every row have a contact_day column, it's the day of the contact. For example, there are 3 contacts row of user 1 (row 0, 1 and 4). Row 0's contact_day and row 1's contact day is 09/26/2018, and row 4's contact_day is 09/27/2018. So row 0 and 1 "turn" column is 1, and row 4 "turn" is 2. How can I get the turn of the contact of the user without looping through every row ?
Upvotes: 3
Views: 107
Reputation: 11232
You could separately get the minimum contact_code for each group of mobiles, then for each group of emails:
min_mobile_cc = df.groupby("mobile").contact_code.transform(np.min)
min_email_cc = df.groupby("email").contact_code.transform(np.min)
For each row in the data, this stores either their original contact_code, or the minimum contact code of any of the other rows in the same mobile/email group.
With this preparation you can then choose the global minimum contact code between mobile and email groups:
df["user"] = np.minimum(min_mobile_cc, min_email_cc)
Upvotes: 2
Reputation: 164783
One way is to sort your dataframe by descending contact_code
and create a couple of dictionary mappings. Then use these mappings to derive the correct contact_code
.
This works because during dictionary construction values for keys are overwritten by later assignments. You are only interested in the minimum mappings, which is applied via our initial sorting.
df_rev = df.sort_values('contact_code', ascending=False)
d1 = df_rev.set_index('mobile')['contact_code'].to_dict()
d2 = df_rev.set_index('email')['contact_code'].to_dict()
df['contact_code_new'] = np.minimum(df['mobile'].map(d1), df['email'].map(d2))
print(df)
mobile email contact_code index_clone contact_code_new
0 972135314 [email protected] 1 0 1
1 972135314 [email protected] 2 1 1
2 943360092 [email protected] 3 2 3
3 988870766 [email protected] 4 3 4
4 972135314 [email protected] 5 4 1
Upvotes: 0