Reputation: 67
I want to remove sensors which appear to have no variance? I removed all sensors with a temperature=0, and can sort by date/day of the week, but further errors within the data have came to light. Some sensors have a string of temperature recordings of 4.5 and 7.3 with no change across many days. I was reproducible code so don't want to simply remove 4.5 and 7.3 values
In [1]: df = pd.DataFrame([[A, 2.045], [A, 3.056], [B, 6], [B, 6], columns=['Sen', 'Temp'])
In [2]: df
Out[2]:
Sen Temp
0 A 2.045
1 A 3.056
2 B 6
3 B 6
So I have grouped the data using basic group and sort functions to get a simple output as above. However. I want to remove all "B" sensors from df.Sen as the variance of values within df.Temp for B equals 0. I'm getting confused just typing this out but is this possible? I was thinking of creating a new column based on a variance calculation and deleting that way, but is there a simpler solution?
Out[2]:
Sen Temp
0 A 2.045
1 A 3.05
Upvotes: 1
Views: 615
Reputation: 150825
You can use groupby().transform()
to mask the variance:
df[df.groupby('Sen').Temp.transform('var') > 0]
Output:
Sen Temp
0 A 2.045
1 A 3.056
However, this might fail if you have some groups with only one valid data point. On the other hand, since variance 0
means only one value across the group, you can use nunique
:
df[df.groupby('Sen').Temp.transform('nunique') > 1]
Upvotes: 2