Reputation: 125
I have dataframe like this
customer_id some_data
0 1 A
1 2 B
2 3 C
3 1 D
and a list of customer_id values with duplicates for example [1,2,2]. Based on these values I want to get a dataframe in which customer_id equal to a value in the list, but if I get a duplicate value in the list, I want duplicate values in rows, for example for [1,2,2] my output should be
customer_id some_data
0 1 A
3 1 D
1 2 B
1 2 B
I tried something like this
df_new= df[df.customer_id == list[0]]
for i in range(1,len(list)):
temp = df[df.customer_id == list[i]]
df_new = pd.concat([df_new, temp])
This code works but mine df is large so the working time of this code is large, can I optimize it somehow?
Upvotes: 1
Views: 75
Reputation: 10853
create another dummy dataframe with the ids you wish to have:
df2 = pd.DataFrame({'customer_id':[1,2,2]})
customer_id
0 1
1 2
2 2
and merge it with the give dataframe:
df.merge(df2)
desired result:
customer_id some_data
0 1 A
1 1 D
2 2 B
3 2 B
Most importantly: your code will work but its very slow for large data. The reason for your long processing time is your for loop! to optimize it you should always aim at vectorizing.
Upvotes: 1