Pandas DataFrame Groupby to get Unique row condition and identify with increasing value up to Number of Groups

Question

I have a DataFrame where a combination of column values identify a unique address (A,B,C). I would like to identify all such rows and assign them a unique identifier that I increment per address.

For example

I would like to generate the following

A B C D E  ID
0 1 1 0 1  0
0 1 2 0 1  1
0 1 1 1 1  0
0 1 3 0 1  2
0 1 2 1 0  1
0 1 1 2 1  0

I tried the following:

id = 0
def set_id(df):
    global id
    df['ID'] = id
    id += 1


df.groupby(['A','B','C']).transform(set_id)

This returns a NULL dataframe...This is definitely not the way to do it..I am new to pandas. The above should actually use df[['A','B','C']].drop_duplicates() to get all unique values Thank you.

Alex Fung · Accepted Answer

I think this is what you need :

df2 = df[['A','B','C']].drop_duplicates() #get unique values of ABC
df2 = df2.reset_index(drop = True).reset_index()  #reset index to create a column named index
df2=df2.rename(columns = {'index':'ID'}) #rename index to ID
df = pd.merge(df,df2,on = ['A','B','C'],how = 'left') #append ID column with merge

Pandas DataFrame Groupby to get Unique row condition and identify with increasing value up to Number of Groups

Answers (2)

Related Questions