Bonjorn
Bonjorn

Reputation: 27

Counting groups in pandas

I am trying to group the groups that share a common element in a Pandas dataframe, my dataframe looks like this (I added the empty lines just for some extra legibility):

ID       REG
15      01497
15      01493

19      01706
19      01706-A

78      05710
78      01738

143     01626
143     01634

144     01626
144     01644

207     05255
207     01638

209     05255
209     03143

I expect to create a new column where I can grop all the IDs that share a same REG number, so it would look like this:

ID     REG          GROUP
15     01497        1
15     01493        1

19     01706        2
19     01706-A      2

78     05710        3
78     01738        3

143    01626        4
143    01634        4

144    01626        4
144    01644        4

207    05255        5
207    01638        5

209    05255        5
209    03143        5

There are some questions that address similar problems like this, but they are not quite the same, (Perhaps Pandas DataFrame Groupby two columns and get counts or How do I find common values within groups in pandas? or Numbering Groups In Pandas DataFrame) but I am not aiming to perform a sum or count in the traditional way.

Upvotes: 1

Views: 141

Answers (3)

Timus
Timus

Reputation: 11321

With df your dataframe you could try the following:

df["GROUP"] = (
    df.groupby("ID", sort=False)["REG"].transform("first").to_frame()
    .groupby("REG", sort=False).ngroup().add(1)
)
  • First group by column ID and pick the resp. first REG item for each ID-group.
  • Then map the so built new REG column onto group numbers.

Result for the sample in the question:

     ID      REG  GROUP
0    15    01497      1
1    15    01493      1
2    19    01706      2
3    19  01706-A      2
4    78    05710      3
5    78    01738      3
6   143    01626      4
7   143    01634      4
8   144    01626      4
9   144    01644      4
10  207    05255      5
11  207    01638      5
12  209    05255      5
13  209    03143      5

Upvotes: 1

Galo do Leste
Galo do Leste

Reputation: 726

Assuming you have a dataframe df with ID as your index, try:

index_lookup = df.index.unique()
df['GROUP'] = df.apply(lambda x: index_lookup.get_loc(x.index), axis=1)
print(df)

if ID is not index, you can either make it the index using:

df = df.set_index('Gender', inplace=True)

or if you do not wish to change the index use:

index_lookup = list(df['Gender'].unique())
df['GROUP'] = df.apply(lambda x : index_lookup.index(x['Gender']), axis=1)

That should work

Upvotes: 0

Timeless
Timeless

Reputation: 37737

One way is to use pandas.Series.duplicated with pandas.Series.cumcum :

df["GROUP"] = (~df["ID"].duplicated() ^ df["REG"].duplicated()).cumsum()

Output :

print(df)

     ID      REG  GROUP
0    15    01497      1
1    15    01493      1
2    19    01706      2
3    19  01706-A      2
4    78    05710      3
5    78    01738      3
6   143    01626      4
7   143    01634      4
8   144    01626      4
9   144    01644      4
10  207    05255      5
11  207    01638      5
12  209    05255      5
13  209    03143      5

Upvotes: 1

Related Questions