Reputation: 23
I have a pandas dataframe with thousands of rows like so:
IntentID IntentName Query Response
1 Intent Name 1 Query 1 Response1
2 Intent Name 1 Query 1 Response2
3 Intent Name 2 Query 2 Response3
4 Intent Name 2 Query 2 Response4
5 Intent Name 3 Query 3 Response5
I need all unique values in "IntentName" to have the same IntentID value like so:
IntentID IntentName Query Response
1 Intent Name 1 Query 1 Response1
1 Intent Name 1 Query 1 Response2
2 Intent Name 2 Query 2 Response3
2 Intent Name 2 Query 2 Response4
3 Intent Name 3 Query 3 Response5
What is the easiest way to do this?
Upvotes: 2
Views: 40
Reputation: 7214
You can use regex:
df['IntentID'] = df.IntentName.apply(lambda x: re.search('(?P<num>\d+)',x).groups()[0])
output
IntentID IntentName Query Response
0 1 Intent Name 1 Query 1 Response1
1 1 Intent Name 1 Query 1 Response2
2 2 Intent Name 2 Query 2 Response3
3 2 Intent Name 2 Query 2 Response4
4 3 Intent Name 3 Query 3 Response5
Upvotes: 1
Reputation: 93161
Try this:
df['IntentID'] = df.groupby('IntentName') \
['IntentID'].transform('first') \
.rank(method='dense') \
.astype('int')
How it works:
IntentName
IntentID
IntentID
s 1, 1, 2, 2, 3, etc. (method=dense
)Upvotes: 2