JeffGray
JeffGray

Reputation: 23

How to create rows for unique values in columns in pandas?

I have a pandas dataframe with thousands of rows like so:

IntentID     IntentName         Query           Response
1            Intent Name 1      Query 1         Response1
2            Intent Name 1      Query 1         Response2
3            Intent Name 2      Query 2         Response3
4            Intent Name 2      Query 2         Response4
5            Intent Name 3      Query 3         Response5

I need all unique values in "IntentName" to have the same IntentID value like so:

IntentID     IntentName         Query           Response
1            Intent Name 1      Query 1         Response1
1            Intent Name 1      Query 1         Response2
2            Intent Name 2      Query 2         Response3
2            Intent Name 2      Query 2         Response4
3            Intent Name 3      Query 3         Response5

What is the easiest way to do this?

Upvotes: 2

Views: 40

Answers (2)

oppressionslayer
oppressionslayer

Reputation: 7214

You can use regex:

df['IntentID'] = df.IntentName.apply(lambda x: re.search('(?P<num>\d+)',x).groups()[0])

output

  IntentID     IntentName    Query   Response
0        1  Intent Name 1  Query 1  Response1
1        1  Intent Name 1  Query 1  Response2
2        2  Intent Name 2  Query 2  Response3
3        2  Intent Name 2  Query 2  Response4
4        3  Intent Name 3  Query 3  Response5

Upvotes: 1

Code Different
Code Different

Reputation: 93161

Try this:

df['IntentID'] = df.groupby('IntentName') \
                    ['IntentID'].transform('first') \
                    .rank(method='dense') \
                    .astype('int')

How it works:

  • Group the rows by IntentName
  • For each group, keep the first IntentID
  • Rank those IntentIDs 1, 1, 2, 2, 3, etc. (method=dense)
  • Convert the ranks to int

Upvotes: 2

Related Questions