Reputation: 4158
I have a dataset like below:
Name ARowss TotalRowss Percentage
motors 11 11 100
trck1 2 2 100
trck2 2 2 100
hydr1 4 4 100
gas1 2 2 100
I am doing some cleanup of data for which I have to assign a new number to each value in "Name". All values are unique in "Name" column. So, from the above dataset, "motors" should have 1, "trck1" should have 2, "trck2" should have 3 and so on.
Upvotes: 1
Views: 535
Reputation: 210972
Is this what you want?
In [5]: df['id'] = pd.factorize(df.Name)[0]
In [6]: df
Out[6]:
Name ARowss TotalRowss Percentage id
0 motors 11 11 100 0
1 trck1 2 2 100 1
2 trck2 2 2 100 2
3 hydr1 4 4 100 3
4 gas1 2 2 100 4
or this, depending on your goals:
In [10]: df.Name = pd.factorize(df.Name)[0] + 1
In [11]: df
Out[11]:
Name ARowss TotalRowss Percentage
0 1 11 11 100
1 2 2 2 100
2 3 2 2 100
3 4 4 4 100
4 5 2 2 100
It will also work for non-unique values:
In [15]: df
Out[15]:
Name ARowss TotalRowss Percentage
0 motors 11 11 100
1 trck1 2 2 100
2 trck2 2 2 100
3 hydr1 4 4 100
4 gas1 2 2 100 # duplicates in `Name`
5 gas1 2 3 111 #
In [16]: df.Name = pd.factorize(df.Name)[0] + 1
In [17]: df
Out[17]:
Name ARowss TotalRowss Percentage
0 1 11 11 100
1 2 2 2 100
2 3 2 2 100
3 4 4 4 100
4 5 2 2 100 #
5 5 2 3 111 #
Upvotes: 1