bfloriang
bfloriang

Reputation: 536

Pandas filter counts

I've some data that I group by occurrence in one second intervals. I'm having issues finding the correct way to filter out counts below a certain threshold, e.g. I would not want to show anything below a count of 100. I've tried various versions of filter / lambda constructs but I was not able to filter before or after the .count() method is called.

df = pd.DataFrame({
     'Date': timestamps,
     'Path': paths,
     })

y = df.groupby([pd.Grouper(freq='1s',key='Date'), 'Path'])
print(y.count())

The output currently resembles this:

Date                Path
2015-12-26 06:08:27 rawdata/file3             1
                    rawdata/file2           118
2015-12-26 06:08:28 rawdata/file1            85
                    rawdata/file2          9796

I would like it to look like this:

Date                Path
2015-12-26 06:08:27 rawdata/file2           118
2015-12-26 06:08:28 rawdata/file2          9796

Upvotes: 3

Views: 12797

Answers (1)

jezrael
jezrael

Reputation: 862571

You can try rename column Date to Count and then subset of rows, where column Count is < 3 (you can change it to value 100):

print df
                  Date           Path
0  2015-12-26 06:08:27  rawdata/file3
1  2015-12-26 06:08:27  rawdata/file2
2  2015-12-26 06:08:27  rawdata/file2
3  2015-12-26 06:08:27  rawdata/file2
4  2015-12-26 06:08:27  rawdata/file2
5  2015-12-26 06:08:27  rawdata/file2
6  2015-12-26 06:08:27  rawdata/file2
7  2015-12-26 06:08:28  rawdata/file1
8  2015-12-26 06:08:28  rawdata/file1
9  2015-12-26 06:08:28  rawdata/file1
10 2015-12-26 06:08:28  rawdata/file1
11 2015-12-26 06:08:28  rawdata/file1
12 2015-12-26 06:08:28  rawdata/file2


y = df.groupby([pd.Grouper(freq='1s', key='Date'), 'Path']).count().rename(columns={'Date':'Count'})
print(y)
                                   Count
Date                Path                
2015-12-26 06:08:27 rawdata/file2      6
                    rawdata/file3      1
2015-12-26 06:08:28 rawdata/file1      5
                    rawdata/file2      1

print y[y.Count < 3]
                                   Count
Date                Path                
2015-12-26 06:08:27 rawdata/file3      1
2015-12-26 06:08:28 rawdata/file2      1

Next approach is set column name:

y = df.groupby([pd.Grouper(freq='1s', key='Date'), 'Path']).count()
print(y)
y.columns = ['Count']

print y[y.Count < 3]

Upvotes: 2

Related Questions