Cesar
Cesar

Reputation: 585

Drop NaN values by group

I have a dataframe with the 'Materiais' column that indicates the code of the product, the 'Values' Column, that corresponds to the product values, and 'Month' columns that corresponds to it months.

       Year   Month Materiais  Values
0      2018    M1     52745     NaN
1      2018    M2     52745     NaN
2      2018    M3     52745     NaN
3      2018    M4     52745     NaN
4      2018    M5     52745     NaN
5      2018    M6     52745     NaN
6      2018    M7     52745     NaN
7      2018    M1     58859     NaN
8      2018    M2     58859     NaN
9      2018    M3     58859     NaN
10     2018    M4     58859     NaN
11     2018    M5     58859     300
12     2018    M6     58859     NaN
13     2018    M7     58859     NaN
14     2018    M1     57673     NaN
15     2018    M2     57673     100
16     2018    M3     57673     NaN
17     2018    M4     57673     150

1-) I would like to have in this dataframe, only products that had at least a value in one month.

So my idea is to group all the similar products codes and check if there is at least one value != NaN.

In order to group, I am using this function:

df = df_demand.groupby(['Materiais'], sort=False, as_index=False)

Result 1

2-) Filter the months that there is at least one value != NaN', using a similar approach:

df = df_demanda.groupby(['Month'], sort=False, as_index=False)

Upvotes: 1

Views: 680

Answers (2)

PMende
PMende

Reputation: 5460

First, you want to get an aggregate that indicates whether that given product has a non-null value (nan's aren't "truthy"):

materiais_any_non_null = df.groupby('Materiais')['Values'].transform('any')

Then just mask your df with that:

df[materiais_any_non_null]

Result:

    Year Month  Materiais  Values
7   2018    M1      58859     NaN
8   2018    M2      58859     NaN
9   2018    M3      58859     NaN
10  2018    M4      58859     NaN
11  2018    M5      58859   300.0
12  2018    M6      58859     NaN
13  2018    M7      58859     NaN
14  2018    M1      57673     NaN
15  2018    M2      57673   100.0
16  2018    M3      57673     NaN
17  2018    M4      57673   150.0

Upvotes: 1

CSY
CSY

Reputation: 208

  1. This will drop rows with Values = NaN:

newdf = df.dropna(subset=['Values'])

  1. Your code should be fine.

Upvotes: 0

Related Questions