Reputation: 5463
I'll try to explain my problem as best as possible. But I'm new to Pandas, so please bear with me. I have a Pandas dataframe df
:
Random_ID Seq_ID Type Seq Token
0 8 1 User First
1 8 2 Agent Second
2 8 3 User Second
3 8 4 User Second
4 8 5 Agent Second
5 13 1 User First
6 13 2 Agent Second
7 13 3 User Second
8 13 4 Agent Second
9 13 5 User Second
10 13 6 Agent Second
11 13 7 User Second
12 13 8 Agent Second
13 13 9 User Second
14 13 10 Agent Second
I have been trying to change the values of Seq Token
(User_First, Agent_Last...) based on the occurrence of User
and Agent
in Type
in each group of df.groupby('Random_ID')
. To illustrate further, take the last row of each group:
grouped = df.groupby('Random ID').last()
which gives:
Seq_ID Type Seq Token
Random_ID
8 5 Agent Second
13 10 Agent Second
Here, if Type=Agent
, then Seq token
should be Agent_Final
. Then, the df
should look like:
Random_ID Seq_ID Type Seq Token
0 8 1 User First
1 8 2 Agent Second
2 8 3 User Second
3 8 4 User Second
4 8 5 Agent Agent_Final
5 13 1 User First
6 13 2 Agent Second
7 13 3 User Second
8 13 4 Agent Second
9 13 5 User Second
10 13 6 Agent Second
11 13 7 User Second
12 13 8 Agent Second
13 13 9 User Second
14 13 10 Agent Agent_Final
I've tried the following:
grouped = df.groupby('Random_ID', as_index=False).last()['Type']
for i in grouped:
if i == 'Agent':
df['Seq Token'] = 'Agent_Final'
but this assigns all items in Seq token
as 'Agent_Final'
:
Random_ID Seq_ID Type Seq Token
0 8 1 User Agent_Final
1 8 2 Agent Agent_Final
2 8 3 User Agent_Final
3 8 4 User Agent_Final
4 8 5 Agent Agent_Final
I read about groupby
and it creates a copy of the original df
and does not allow changing it unless one explicitly changes a df[column]
. I hope this makes sense.
I've managed to set the first row "group values" to "First"
using np.where()
like this:
df['Seq Token'] = np.where((np.logical_and(np.equal(df['Type'],'User'), np.equal(df['Seq_ID'],1))), 'First', 'Second')
You can see that I've implemented this in the df
already. Note that I used the Seq_ID
value to get the first row in the group.
If there is a way to chain np.where()
in such as way that I can assign Seq Token
as User_First
(same as First), User_Middle
(if Type=User
occurs in the middle), Agent_Middle
(if Type=Agent
occurs in the middle), Agent_Last
(as explained above: if Agent is last), then it would be the most ideal solution. However, any other solutions are welcome too.
Thanks in advance!
Upvotes: 6
Views: 11405
Reputation: 323226
IIUC, you can using index
assign after groupby
s=df.groupby('Random_ID').tail(1).loc[lambda x : x.Type=='Agent'].index
s
Out[62]: Int64Index([4, 14], dtype='int64')
df.loc[s,'SeqToken']='Agent_Final'
df
Out[64]:
Random_ID Seq_ID Type SeqToken
0 8 1 User First
1 8 2 Agent Second
2 8 3 User Second
3 8 4 User Second
4 8 5 Agent Agent_Final
5 13 1 User First
6 13 2 Agent Second
7 13 3 User Second
8 13 4 Agent Second
9 13 5 User Second
10 13 6 Agent Second
11 13 7 User Second
12 13 8 Agent Second
13 13 9 User Second
14 13 10 Agent Agent_Final
Upvotes: 5