Reputation: 536
I've some data that I group by occurrence in one second intervals. I'm having issues finding the correct way to filter out counts below a certain threshold, e.g. I would not want to show anything below a count of 100. I've tried various versions of filter / lambda constructs but I was not able to filter before or after the .count() method is called.
df = pd.DataFrame({
'Date': timestamps,
'Path': paths,
})
y = df.groupby([pd.Grouper(freq='1s',key='Date'), 'Path'])
print(y.count())
The output currently resembles this:
Date Path
2015-12-26 06:08:27 rawdata/file3 1
rawdata/file2 118
2015-12-26 06:08:28 rawdata/file1 85
rawdata/file2 9796
I would like it to look like this:
Date Path
2015-12-26 06:08:27 rawdata/file2 118
2015-12-26 06:08:28 rawdata/file2 9796
Upvotes: 3
Views: 12797
Reputation: 862571
You can try rename column Date
to Count
and then subset of rows, where column Count
is < 3
(you can change it to value 100
):
print df
Date Path
0 2015-12-26 06:08:27 rawdata/file3
1 2015-12-26 06:08:27 rawdata/file2
2 2015-12-26 06:08:27 rawdata/file2
3 2015-12-26 06:08:27 rawdata/file2
4 2015-12-26 06:08:27 rawdata/file2
5 2015-12-26 06:08:27 rawdata/file2
6 2015-12-26 06:08:27 rawdata/file2
7 2015-12-26 06:08:28 rawdata/file1
8 2015-12-26 06:08:28 rawdata/file1
9 2015-12-26 06:08:28 rawdata/file1
10 2015-12-26 06:08:28 rawdata/file1
11 2015-12-26 06:08:28 rawdata/file1
12 2015-12-26 06:08:28 rawdata/file2
y = df.groupby([pd.Grouper(freq='1s', key='Date'), 'Path']).count().rename(columns={'Date':'Count'})
print(y)
Count
Date Path
2015-12-26 06:08:27 rawdata/file2 6
rawdata/file3 1
2015-12-26 06:08:28 rawdata/file1 5
rawdata/file2 1
print y[y.Count < 3]
Count
Date Path
2015-12-26 06:08:27 rawdata/file3 1
2015-12-26 06:08:28 rawdata/file2 1
Next approach is set column name:
y = df.groupby([pd.Grouper(freq='1s', key='Date'), 'Path']).count()
print(y)
y.columns = ['Count']
print y[y.Count < 3]
Upvotes: 2