Reputation: 185
I have a call log data made on customers. Which looks something like below, where ID is customer ID and A and B are log attributes:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(10, 2)), columns=list('AB'),
index = ['A','A','A','B','B','C','C','C','D','D'])
df['ID']=df.index
df = df[['ID','A','B']]
ID A B
A A 46 31
A A 99 54
A A 34 9
B B 46 48
B B 7 75
C C 1 25
C C 71 40
C C 74 53
D D 57 17
D D 19 78
I want to replicate each set of event for each ID based on some slots. For e.g. if slot value is 2 then all events for ID "A" should be replicated slot-1 times.
ID A B
A A 46 31
A A 99 54
A A 34 9
A A 46 31
A A 99 54
A A 34 9
and a new Index should be created indicating which slot does replicated values belong to:
ID A B Index
A 46 31 A-1
A 99 54 A-1
A 34 9 A-1
A 46 31 A-2
A 99 54 A-2
A 34 9 A-2
I have tried following solution:
slots = 2
nba_data = pd.DataFrame()
idx = pd.Index(list(range(1,slots+1)))
for i in unique_rec_counts_dict:
b = df.loc[df.ID==i,:]
b = b.append([b]*(slots-1),ignore_index=True)
b['Index'] = str(i)+'-'+idx.repeat(unique_rec_counts_dict[i]).astype(str)
nba_data = nba_data.append(b)
it gives me the expected output but is not scalable when slots are increased and number of customers increases in order of 10k.
ID A B Index
0 A 46 31 A-1
1 A 99 54 A-1
2 A 34 9 A-1
3 A 46 31 A-2
4 A 99 54 A-2
5 A 34 9 A-2
0 B 46 48 B-1
1 B 7 75 B-1
2 B 46 48 B-2
3 B 7 75 B-2
0 C 1 25 C-1
1 C 71 40 C-1
2 C 74 53 C-1
3 C 1 25 C-2
4 C 71 40 C-2
5 C 74 53 C-2
0 D 57 17 D-1
1 D 19 78 D-1
2 D 57 17 D-2
3 D 19 78 D-2
I think its taking a long time because of the loop. Any solution which is vectorized will be really helpful.
Upvotes: 1
Views: 49
Reputation: 150745
You can try:
slots = 2
new_df = pd.concat(df.assign(Index=f'_{i}') for i in range(1, slots+1))
new_df['Index'] = new_df['ID'] + new_df['Index']
Output:
ID A B Index
A A 48 61 A_1
A A 70 13 A_1
A A 36 23 A_1
B B 22 66 B_1
B B 92 95 B_1
C C 53 9 C_1
C C 41 57 C_1
C C 88 93 C_1
D D 76 82 D_1
D D 11 36 D_1
A A 48 61 A_2
A A 70 13 A_2
A A 36 23 A_2
B B 22 66 B_2
B B 92 95 B_2
C C 53 9 C_2
C C 41 57 C_2
C C 88 93 C_2
D D 76 82 D_2
D D 11 36 D_2
Upvotes: 1