Giammaria Giordano
Giammaria Giordano

Reputation: 35

Merge rows dataframe based on two columns - Python

I have a data frame df that has the following information:

|   Project | class   | x     |y    | z   |
| ---       | ---     | ---   | --- | --- | 
| Project_A | c.java  |a1     | a2  |     |
| Project_A | c.java  |       |     | a3  |
| Project_b | t.java  |b1     |b2   |     |

I need to combine these lines if the values of project and Class are equal among them.

Based on the previous example, the aspected output in this case have be:

| Project   | class  | x     |y    | z   |
| ---       | ---    | ---   | --- | --- |
| Project_A | c.java |a1     |a2   | a3  |
| Project_b | t.java |b1     |b2   |     |

it is also important to note that the way the dataset was constructed, there is no risk of rewriting values; so, in other words, you will never have such a situation:

| Project   | class  | x   |y   | z   |
| ---       | ---    | --- | ---| --- |
| Project_A | c.java |a1   | a2 |     |
| Project_A | c.java |a_x  | a_y|a_z  |
| Project_b | t.java |b1   |b2  |     |

How can do it?

Upvotes: 0

Views: 527

Answers (2)

ArchAngelPwn
ArchAngelPwn

Reputation: 3046

This will groupby the project and class and then find the first value of the grouping. I'm not sure if your data will allow for something like this since if there is another example of data in the column for the project/class combination it might mess up some of your data

df.groupby(['Project', 'class'], as_index=False).agg('first')

Upvotes: 2

NirF
NirF

Reputation: 279

You can use groupby() on both columns and sum the others:

X = pd.DataFrame({
        'Project':["Project_A","Project_A",'Project_b'],
        'class':["c.java","c.java","t.java"],
        'x':["a1",None,"b1"],
        'y':["a2",None,"b2"],
        'z':[None,"a3",None]})

Y= X.groupby(['Project','class']).sum()
print(Y)

Output:

                   x   y   z
Project   class
Project_A c.java  a1  a2  a3
Project_b t.java  b1  b2   0

Upvotes: 1

Related Questions