Reputation: 632
Let's say I have this dataframe:
A | B | C
---------
n | b | c
n | b | c
n | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c
n | b | c
n | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c
I want to fill and replace the column A rows values with 'x'. The rows to fill are the ones before 's' and after 'e' but not in between. So the result would be somthing like this :
A | B | C
---------
x | b | c
x | b | c
x | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c
x | b | c
x | b | c
s | b | c
n | b | c
n | b | c
n | b | c
e | b | c
Here's what I have tried :
def applyFunc(s):
if 's' in str(s):
return 'x'
return ''
df['A'] = df['A'].apply(applyFunc)
But this only replaces rows where there is 's'.
Upvotes: 1
Views: 1396
Reputation: 765
solution1:
df1.assign(col1=(df1.A=='s').cumsum())\
.assign(col2=(df1.A=='e').cumsum().shift().fillna(0))\
.assign(A=lambda dd:dd.A.mask(dd.col1==dd.col2,'x'))
A B C s e
0 x b c 0 NaN
1 x b c 0 0.0
2 x b c 0 0.0
3 s b c 1 0.0
4 n b c 1 0.0
5 n b c 1 0.0
6 n b c 1 0.0
7 e b c 1 0.0
8 x b c 1 -1.0
9 x b c 1 -1.0
10 s b c 2 -1.0
11 n b c 2 -1.0
12 n b c 2 -1.0
13 n b c 2 -1.0
14 e b c 2 -1.0
solution2:
def function1(dd:pd.DataFrame):
dd.loc[:dd.query("A=='s'").index.values[-1]-1,'A']='x'
return dd.drop('col1',axis=1)
df1.assign(col1=(df1.A=='e').cumsum().shift()).groupby('col1').apply(function1)
Upvotes: 0
Reputation: 260335
First find the rows where a value is after 'e' or 's' with:
A = d['A'] # enables shorter reference to df['A']
A.where(A.isin(['e', 's'])).ffill().fillna('e')
['e', 'e', 'e', 's', 's', 's', 's', 'e', 'e', 'e', 's', 's', 's', 's', 'e']
Then find the 'n' where is it after a 's' and replace with 'x':
df['new_A'] = A.mask((A.where(A.isin(['e', 's'])).ffill().fillna('e').eq('e')&A.eq('n')), 'x')
output:
A B C new_A
0 n b c x
1 n b c x
2 n b c x
3 s b c s
4 n b c n
5 n b c n
6 n b c n
7 e b c e
8 n b c x
9 n b c x
10 s b c s
11 n b c n
12 n b c n
13 n b c n
14 e b c e
NB. I saved the output in a new column for clarity, but the real code should be df['A'] = …
Upvotes: 4
Reputation: 35626
Assuming that there are no duplicate s
or e
within groups, we can Series.mask
the n
values between s
and e
. We can track if we're between s
and e
by comparing there the Series.cumsum
of s
and e
are equal:
df['A'] = df['A'].mask(
df['A'].eq('s').cumsum().eq(df['A'].eq('e').cumsum()) & df['A'].eq('n'),
'x'
)
df
:
A B C
0 x b c
1 x b c
2 x b c
3 s b c
4 n b c
5 n b c
6 n b c
7 e b c
8 x b c
9 x b c
10 s b c
11 n b c
12 n b c
13 n b c
14 e b c
Breakdown of steps as columns:
# See Where S are
df['S cumsum'] = df['A'].eq('s').cumsum()
# See where E are
df['E cumsum'] = df['A'].eq('e').cumsum()
# See where S and E are the same meaning we have seen both or neither but
# not one or the other
df['S == E cumsum'] = df['S cumsum'].eq(df['E cumsum'])
# See where A is n
df['S == E cumsum AND A == n'] = df['S == E cumsum'] & df['A'].eq('n')
A B C S cumsum E cumsum S == E cumsum S == E cumsum AND A == n
0 n b c 0 0 True True
1 n b c 0 0 True True
2 n b c 0 0 True True
3 s b c 1 0 False False
4 n b c 1 0 False False
5 n b c 1 0 False False
6 n b c 1 0 False False
7 e b c 1 1 True False
8 n b c 1 1 True True
9 n b c 1 1 True True
10 s b c 2 1 False False
11 n b c 2 1 False False
12 n b c 2 1 False False
13 n b c 2 1 False False
14 e b c 2 2 True False
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
'A': ['n', 'n', 'n', 's', 'n', 'n', 'n', 'e', 'n', 'n', 's', 'n', 'n', 'n',
'e'],
'B': ['b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b',
'b'],
'C': ['c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
'c']
})
If there are duplicates we can filter out the desired start and end values (s
and e
) and take only even groups:
df
:
df = pd.DataFrame({
'A': ['n', 'n', 'n', 's', 's', 'n', 'n', 'e', 'n', 'n', 's', 'n', 'n', 'e',
'e'],
'B': ['b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b',
'b'],
'C': ['c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c', 'c',
'c']
})
A B C
0 n b c
1 n b c
2 n b c
3 s b c
4 s b c # Duplicate S
5 n b c
6 n b c
7 e b c
8 n b c
9 n b c
10 s b c
11 n b c
12 n b c
13 e b c
14 e b c # Duplicate E
Find s
and e
and filter to keep only even groups:
s = df.loc[df['A'].isin(['s', 'e']), 'A']
df['A'] = df['A'].mask(
((df.index.isin(s[s.ne(s.shift())].index).cumsum() % 2) == 0)
& df['A'].eq('n'),
'x'
)
df
:
A B C
0 x b c
1 x b c
2 x b c
3 s b c
4 s b c
5 n b c
6 n b c
7 e b c
8 x b c
9 x b c
10 s b c
11 n b c
12 n b c
13 e b c
14 e b c
Upvotes: 2