Reputation: 21
I have a pandas DataFrame.For example,
Date Time A B C
0 1.1.2015 00:00 2 16 50
1 1.1.2015 01:00 2 9 50
2 1.1.2015 02:00 4 6 50
3 1.1.2015 03:00 3 7 31
4 1.1.2015 04:00 2 7 42
5 1.1.2015 05:00 2 7 22
6 1.1.2015 06:00 2 7 14
7 1.1.2015 07:00 2 11 50
8 1.1.2015 08:00 3 11 28
9 1.1.2015 09:00 2 18 17
I want to delete numbers that successively repeated more than 3 times but keep the first one. I need to delete:
1- rows 5,6 and 7 because there are four 2s in column A and i dont need the last three.
2- rows 4,5 and 6 because there are four 7s in column B
3- rows 1 and 2 because there are three 50s in column C
So the output i want is like:
Date Time A B C
0 1.1.2015 00:00 2 16 50
1 1.1.2015 03:00 3 7 31
2 1.1.2015 08:00 3 11 28
3 1.1.2015 09:00 2 18 17
I have searched similar problems and i find this one the most similar: "Removing values that repeat more than 5 times in Pandas DataFrame" and i tried to adjust it to my problem but i could not(i am a beginner at python). Can anyone help me?
Thanks.
Upvotes: 2
Views: 824
Reputation: 109
def remover(x=[]):
new = []
cnt = Counter
op = list(cnt(x).elements())
op = np.array(op,dtype=np.int32)
x = pd.Series(op)
num = x.value_counts()
num = np.array(num,dtype = np.int32)
for val in num:
if val >= 3:
del(val)
new.append(val)
print(new)
Upvotes: 0
Reputation: 31181
You can do, with itertools
help:
import itertools
import numpy as np
def f(serie):
xs = []
for el, gr in itertools.groupby(serie):
x = np.repeat(True, len(list(gr)))
if len(x)>=3:
x[1:]=False
xs.append(x)
return np.concatenate(xs)
df[df[['A','B','C']].apply(f, axis=0).apply(np.all, axis=1)]
#Out[64]:
# Date Time A B C
#0 1.1.2015 00:00 2 16 50
#3 1.1.2015 03:00 3 7 31
#8 1.1.2015 08:00 3 11 28
#9 1.1.2015 09:00 2 18 17
The idea is to use the utility function f
to count the number of consecutive element in a column and create the associated desired boolean mask - eg you can check the result of f(df['A'])
. And then aggregate these boolean masks using np.all
to filter your original dataframe.
Upvotes: 1