Lucas
Lucas

Reputation: 91

Merge rows with same index in a DataFrame

I have a DataFrame with species and their occurence. An excerpt follows:

+-------------+------------+-------------------------------------------+--------------+-----------------+-------------+-----------+------------+----------+
| Family      | Genus      | Species                                   | Species code | Nearctic Mexico | Mesoamerica | Caribbean | Los Llanos | Amazonia |
+-------------+------------+-------------------------------------------+--------------+-----------------+-------------+-----------+------------+----------+
| Acanthaceae | Aphelandra | Aphelandra castaneifolia Britton ex Rusby | Aphecast     |                 |             |           |            |          |
+-------------+------------+-------------------------------------------+--------------+-----------------+-------------+-----------+------------+----------+
| Acanthaceae | Aphelandra | Aphelandra pilosa Leonard                 | Aphpilo      |                 | +           |           | +          | +        |
+-------------+------------+-------------------------------------------+--------------+-----------------+-------------+-----------+------------+----------+
| Acanthaceae | Aphelandra | Aphelandra pulcherrima (Jacq.) Kunth      | Aphepulc     |                 | +           | +         | +          | +        |
+-------------+------------+-------------------------------------------+--------------+-----------------+-------------+-----------+------------+----------+
| Acanthaceae | Aphelandra | Aphelandra sinclairiana Nees              | Aphesinc     |                 | +           |           |            |          |
+-------------+------------+-------------------------------------------+--------------+-----------------+-------------+-----------+------------+----------+

I need to analyze genera only, but preserve their occurences as follows. I'm sorry if that's a real rookie question, but I'm having trouble putting it in words what I intend to do. This is the expected result:

+-------------+------------+-----------------+-------------+-----------+------------+----------+
| Family      | Genus      | Nearctic Mexico | Mesoamerica | Caribbean | Los Llanos | Amazonia |
+-------------+------------+-----------------+-------------+-----------+------------+----------+
| Acanthaceae | Aphelandra |                 | +           | +         | +          | +        |
+-------------+------------+-----------------+-------------+-----------+------------+----------+

I have found similar problems, but dealing with numeric data and I couldn't manage to adapt the solution.

Upvotes: 1

Views: 451

Answers (1)

jezrael
jezrael

Reputation: 862441

Use GroupBy.first for first non misisng values for 5th+ columns:

#if necessary replace empty space to missing values
df = df.replace('',np.nan)

cols = df.columns[4:].tolist()
df = df.groupby(['Family','Genus'], as_index=False)[cols].first()
print (df)
        Family       Genus  Nearctic Mexico Mesoamerica Caribbean Los Llanos  \
0  Acanthaceae  Aphelandra              NaN           +         +          +   

  Amazonia  
0        +  

Upvotes: 1

Related Questions