user308827
user308827

Reputation: 21961

Add column with unique identifiers based on values from other columns in pandas

I have the foll. dataframe:

Cnt Year    JD  Min_Temp
S   2000    1   277.139
S   2000    2   274.725
S   2001    1   270.945
S   2001    2   271.505
N   2000    1   257.709
N   2000    2   254.533
N   2000    3   258.472
N   2001    1   255.763
N   2001    2   265.714
N   2001    3   267.943

I would like to add a new column where each separate row for a given 'Cnt' is given a unique identifier (1,2,3...). So, the result should look like this:

Cnt Year    JD  Min_Temp    unq
S   2000    1   277.139     1
S   2000    2   274.725     2
S   2001    1   270.945     3
S   2001    2   271.505     4
N   2000    1   257.709     1
N   2000    2   254.533     2
N   2000    3   258.472     3
N   2001    1   255.763     4
N   2001    2   265.714     5
N   2001    3   267.943     6

Here, each row corresponding to the same value in the column 'Cnt' as a unique identifier.

Currently, all I can do is add a new column with increasing values: df['unq'] = numpy.arange(1,len(df))

Upvotes: 0

Views: 1635

Answers (1)

DSM
DSM

Reputation: 353019

You could use groupby with cumcount

>>> df["unq"] = df.groupby("Cnt").cumcount() + 1
>>> df
  Cnt  Year  JD  Min_Temp  unq
0   S  2000   1   277.139    1
1   S  2000   2   274.725    2
2   S  2001   1   270.945    3
3   S  2001   2   271.505    4
4   N  2000   1   257.709    1
5   N  2000   2   254.533    2
6   N  2000   3   258.472    3
7   N  2001   1   255.763    4
8   N  2001   2   265.714    5
9   N  2001   3   267.943    6

Note that because the groups are based on the Cnt column values and not on contiguity, if you have a second group of S below the group of N, the first unq value in that group will be 5.

Upvotes: 3

Related Questions