Reputation:
I am trying to delete all rows
from a pandas df
. Specifically, when the row
beneath X
in Col A
is empty. So if the row
underneath X
in Col A
is empty I want to delete all those rows until there is a string
underneath value X
import pandas as pd
d = ({
'A' : ['X','','','X','Foo','','X','Fou','','X','Bar'],
'B' : ['Val',1,3,'Val',1,3,'Val',1,3,'Val',1],
'C' : ['Val',2,4,'Val',2,4,'Val',2,4,'Val',2],
})
df = pd.DataFrame(data=d)
Output:
A B C
0 X Val Val
1 1 2
2 3 4
3 X Val Val
4 Foo 1 2
5 3 4
6 X Val Val
7 Fou 1 2
8 3 4
9 X Val Val
10 Bar 1 2
I have tried:
df = df[~(df['A'] == 'X').shift().fillna(False)]
But this deletes everything followed by X. I only want it to be deleted if the next row below X is empty.
Intended:
A B C
0 X Val Val
1 Foo 1 2
2 3 4
3 X Val Val
4 Fou 1 2
5 4 4
6 X Val Val
7 Bar 1 2
Upvotes: 4
Views: 564
Reputation: 863166
Use:
m1 = df['A'] == 'X'
g = m1.cumsum()
m = (df['A'] == '') | m1
df = df[~m.groupby(g).transform('all')]
print (df)
A B C
3 X Val Val
4 Foo 1 2
5 3 4
6 X Val Val
7 Fou 1 2
8 3 4
9 X Val Val
10 Bar 1 2
Details:
m1 = df['A'] == 'X'
g = m1.cumsum()
m = (df['A'] == '') | m1
print (pd.concat([df,
df['A'] == 'X',
m1.cumsum(),
(df['A'] == ''),
m,
m.groupby(g).transform('all'),
~m.groupby(g).transform('all')], axis=1,
keys=['orig','==X','g','==space','m', 'all', 'inverted all']))
orig ==X g ==space m all inverted all
A B C A A A A A A
0 X Val Val True 1 False True True False
1 1 2 False 1 True True True False
2 3 4 False 1 True True True False
3 X Val Val True 2 False True False True
4 Foo 1 2 False 2 False False False True
5 3 4 False 2 True True False True
6 X Val Val True 3 False True False True
7 Fou 1 2 False 3 False False False True
8 3 4 False 3 True True False True
9 X Val Val True 4 False True False True
10 Bar 1 2 False 4 False False False True
Explanation:
X
and create cumulative sum for groups starts with X
to g
X
and empty space to m
groupby
with transform
and DataFrameGroupBy.all
for return True
s for groups with only True
boolean indexing
Upvotes: 1
Reputation: 2009
This is your solution:
(df['A'] == 'X').shift()
0 NaN
1 True
2 False
3 False
4 True
5 False
6 False
7 True
8 False
9 False
10 True
Name: A, dtype: object
In [15]:
(df['A'] == '')
Out[15]:
0 False
1 True
2 True
3 False
4 False
5 True
6 False
7 False
8 True
9 False
10 False
Name: A, dtype: bool
In [14]:
((df['A'] == '') & (df['A'] == 'X').shift())
Out[14]:
0 False
1 True
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
Name: A, dtype: bool
And the result is:
df[~((df['A'] == '') & (df['A'] == 'X').shift())]
Out[16]:
A B C
0 X Val Val
2 3 4
3 X Val Val
4 Foo 1 2
5 3 4
6 X Val Val
7 Fou 1 2
8 3 4
9 X Val Val
10 Bar 1 2
EDIT: If you need you can do it in while loop. old_size_df = df.size new_size_df = 0
while old_size_df != new_size_df:
old_size_df = df.size
df = df[~((df['A'] == '') & (df['A'] == 'X').shift())]
new_size_df = df.size
A B C
0 X Val Val
3 X Val Val
4 Foo 1 2
5 3 4
6 X Val Val
7 Fou 1 2
8 3 4
9 X Val Val
10 Bar 1 2
Upvotes: 0
Reputation: 5573
Here is solution with custom apply function:
d = ({
'A' : ['X','','','X','Foo','','X','Fou','','X','Bar'],
'B' : ['Val',1,3,'Val',1,3,'Val',1,3,'Val',1],
'C' : ['Val',2,4,'Val',2,4,'Val',2,4,'Val',2],
})
df = pd.DataFrame(data=d)
is_x = False
def fill_empty_a(row):
global is_x
if row['A'] == '' and is_x:
row['A'] = None
else:
is_x = row['A'] == 'X'
return row
(df.apply(fill_empty_a, axis=1)
.dropna()
.reset_index(drop=True))
# A B C
# 0 X Val Val
# 1 X Val Val
# 2 Foo 1 2
# 3 3 4
# 4 X Val Val
# 5 Fou 1 2
# 6 3 4
# 7 X Val Val
# 8 Bar 1 2
Upvotes: 0