JohnTheBadge
JohnTheBadge

Reputation: 67

Zero-Variance Removal

I want to remove sensors which appear to have no variance? I removed all sensors with a temperature=0, and can sort by date/day of the week, but further errors within the data have came to light. Some sensors have a string of temperature recordings of 4.5 and 7.3 with no change across many days. I was reproducible code so don't want to simply remove 4.5 and 7.3 values

In [1]: df = pd.DataFrame([[A, 2.045], [A, 3.056], [B, 6], [B, 6], columns=['Sen', 'Temp'])

In [2]: df Out[2]: Sen Temp 0 A 2.045 1 A 3.056 2 B 6 3 B 6

So I have grouped the data using basic group and sort functions to get a simple output as above. However. I want to remove all "B" sensors from df.Sen as the variance of values within df.Temp for B equals 0. I'm getting confused just typing this out but is this possible? I was thinking of creating a new column based on a variance calculation and deleting that way, but is there a simpler solution?

Out[2]: Sen Temp 0 A 2.045 1 A 3.05

Upvotes: 1

Views: 615

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

You can use groupby().transform() to mask the variance:

df[df.groupby('Sen').Temp.transform('var') > 0]

Output:

  Sen   Temp
0   A  2.045
1   A  3.056

However, this might fail if you have some groups with only one valid data point. On the other hand, since variance 0 means only one value across the group, you can use nunique:

df[df.groupby('Sen').Temp.transform('nunique') > 1]

Upvotes: 2

Related Questions