Amin Ba
Amin Ba

Reputation: 2436

How to modify groups of a grouped pandas dataframe

I have this dataframe:

s = pd.DataFrame({'A': [*'1112222'], 'B': [*'abcdefg'], 'C': [*'ABCDEFG']})

that is like this:

    A   B   C
0   1   a   A
1   1   b   B
2   1   c   C
3   2   d   D
4   2   e   E
5   2   f   F
6   2   g   G

I want to do a groupby like this:

groups = s.groupby("A")

for example, the group 2 is:

g2 = groups.get_group("2")

that looks like this:

    A   B   C
3   2   d   D
4   2   e   E
5   2   f   F
6   2   g   G

Anyway, I want to do some operation in each group.

Let me show how my final result should be:

    A   B   C   D
1   1   b   B   a=b;A=B
2   1   c   C   a=c;A=C
4   2   e   E   d=e;D=E
5   2   f   F   d=f;F=F
6   2   g   G   d=g;D=G

Actually, I am dropping the first row in each group but combining it with the other rows of the group to create column C

Any idea how to do this?

Summary of what I want to do in two lines: I want to do a group by and in each group, I want to drop the first row. I also want to add a column to the whole dataframe that is based on the rows of the group


What I have tried:

In order to solve this, I am going to create a function:

def func(g):
    first_row_of_group = g.iloc[0]
    g = g.iloc[1:]
    g["C"] = g.apply(lambda row: ";".join([f'{a}={b}' for a, b in zip(row, first_row_of_group)]))
    return g

Then I am going to do this:

groups.apply(lambda g: func(g))

Upvotes: 1

Views: 527

Answers (1)

user7864386
user7864386

Reputation:

You can apply a custom function to each group where you add the elements from the first row to the remaining rows and remove it:

def remove_first(x):
    first = x.iloc[0]
    x = x.iloc[1:]
    x['D'] = first['B'] + '=' + x['B'] + ';' + first['C'] + '=' + x['C']
   # an equivalent operation
   # x['D'] = first.iloc[1] + '=' + x.iloc[:,1] + ';' + first.iloc[2] + '=' + x.iloc[:,2]
    return x

s = s.groupby('A').apply(remove_first).droplevel(0)

Output:

   A  B  C        D
1  1  b  B  a=b;A=B
2  1  c  C  a=c;A=C
4  2  e  E  d=e;D=E
5  2  f  F  d=f;D=F
6  2  g  G  d=g;D=G

Note: The dataframe shown in your question is constructed from

s = pd.DataFrame({'A': [*'1112222'], 'B': [*'abcdefg'], 'C': [*'ABCDEFG']})

but you give a different one as raw input.

Upvotes: 1

Related Questions