Gannina
Gannina

Reputation: 133

Pandas dataframe group by list

I have the following dataframe with names of people and their abbreviation. The aim is to perform name disambiguation:

    Names                       Abb
0   Michaele Frendu             [Mic, Fre]
1   Lucam Zamit                 [Luc, Zam]
2   magistro Johanne Luckys     [Joh, Luc]
3   Albano Fava                 [Alb, Fav]
4   Augustino Bagliu            [Aug, Bag]
5   Lucas Zamit                 [Luc, Zam]
6   Jngabellavit                [Jng]
7   Micheli Frendu              [Mic, Fre]
8   Luce                        [Luc]
9   Far                         [Far]

Can I group by list ie: row 1, 7 and row 1,5. Later on I was going to do something similar with just the first names.

Upvotes: 1

Views: 61

Answers (2)

U13-Forward
U13-Forward

Reputation: 71580

Or map:

df.groupby(df['Abb'].map(tuple)).do_something

I do this because list aren't hash-able objects

Upvotes: 0

jezrael
jezrael

Reputation: 862661

If want groupby list, is necessary convert column to tuples first:

def func(x):
    print (x)
    #some code
    return x

df1 = df.groupby(df['Abb'].apply(tuple)).apply(func)

         Names         Abb
3  Albano Fava  [Alb, Fav]
         Names         Abb
3  Albano Fava  [Alb, Fav]
              Names         Abb
4  Augustino Bagliu  [Aug, Bag]
  Names    Abb
9   Far  [Far]
          Names    Abb
6  Jngabellavit  [Jng]
                     Names         Abb
2  magistro Johanne Luckys  [Joh, Luc]
  Names    Abb
8  Luce  [Luc]
         Names         Abb
1  Lucam Zamit  [Luc, Zam]
5  Lucas Zamit  [Luc, Zam]
             Names         Abb
0  Michaele Frendu  [Mic, Fre]
7   Micheli Frendu  [Mic, Fre]

Upvotes: 1

Related Questions