Add column with unique identifiers based on values from other columns in pandas

Question

I have the foll. dataframe:

Cnt Year    JD  Min_Temp
S   2000    1   277.139
S   2000    2   274.725
S   2001    1   270.945
S   2001    2   271.505
N   2000    1   257.709
N   2000    2   254.533
N   2000    3   258.472
N   2001    1   255.763
N   2001    2   265.714
N   2001    3   267.943

I would like to add a new column where each separate row for a given 'Cnt' is given a unique identifier (1,2,3...). So, the result should look like this:

Cnt Year    JD  Min_Temp    unq
S   2000    1   277.139     1
S   2000    2   274.725     2
S   2001    1   270.945     3
S   2001    2   271.505     4
N   2000    1   257.709     1
N   2000    2   254.533     2
N   2000    3   258.472     3
N   2001    1   255.763     4
N   2001    2   265.714     5
N   2001    3   267.943     6

Here, each row corresponding to the same value in the column 'Cnt' as a unique identifier.

Currently, all I can do is add a new column with increasing values: df['unq'] = numpy.arange(1,len(df))

DSM · Accepted Answer

You could use groupby with cumcount

>>> df["unq"] = df.groupby("Cnt").cumcount() + 1
>>> df
  Cnt  Year  JD  Min_Temp  unq
0   S  2000   1   277.139    1
1   S  2000   2   274.725    2
2   S  2001   1   270.945    3
3   S  2001   2   271.505    4
4   N  2000   1   257.709    1
5   N  2000   2   254.533    2
6   N  2000   3   258.472    3
7   N  2001   1   255.763    4
8   N  2001   2   265.714    5
9   N  2001   3   267.943    6

Note that because the groups are based on the Cnt column values and not on contiguity, if you have a second group of S below the group of N, the first unq value in that group will be 5.

Add column with unique identifiers based on values from other columns in pandas

Answers (1)

Related Questions