Reputation: 223
My dataframe is as below
df
time home_team away_team full_time_result both_teams_to_score double_chance League
-- ------------------- ------------ ------------------ --------------------------------- ------------------------- ------------------------------------ ----------------
0 2021-01-08 19:45:00 Charlton Accrington Stanley {'1': 2370, 'X': 3400, '2': 3000} {'yes': 1900, 'no': 1900} {'1X': 1360, '12': 1300, '2X': 1530} England League 1
1 2021-01-09 12:30:00 Lincoln City Peterborough {'1': 2290, 'X': 3400, '2': 3100} {'yes': 1800, 'no': 1950} {'1X': 1360, '12': 1300, '2X': 1570} England League 1
2 2021-01-09 13:00:00 Gillingham Burton Albion {'1': 2200, 'X': 3400, '2': 3300} {'yes': 1700, 'no': 2040} {'1X': 1330, '12': 1300, '2X': 1610} England League 1
3 2021-01-09 17:30:00 Ipswich Swindon {'1': None, 'X': None, '2': None} {'yes': 1750, 'no': 2000} {'1X': 1220, '12': 1250, '2X': 1900} England League 1
How can I delete row containing None? as in this example in col full_time_result
I want to delete the row {'1': None, 'X': None, '2': None}
Thanks
Upvotes: 4
Views: 480
Reputation: 20659
You can create a boolean mask to filter out values of full_time_result
with None
in '1'
and '2'
. Tp extract values we can use operator.itemgetter
then use __eq__
to check equality i.e check if it's (None, None)
from operator import itemgetter
m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__eq__)
df[~m]
# Alternative
# m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__ne__)
# df[m]
_.map(itemgetter('1', '2')).map((None, None).__eq__)
# All of this can be written using lambda in single line.
_.map(lambda x: itemgetter('1', '2')(x).__eq__((None, None)))
example_dict = {'1': 10, '2': 20}
itemgetter('1', '2')(example_dict)
# (10, 20)
# Since you want to identify values with `None`. We can leverage on __eq__
itemgetter('1', '2')(example_dict).__eq__((10, 20))
# True # equivalent to (10, 20) == (10, 20)
# Benchmarking setup
s = pd.Series([{'1':10, '2':20}, {'1':None, '2':None}, {'1':1, '2':2}])
df = s.repeat(1_000_000).to_frame('full_time_result')
df.shape
# (3000000, 1) # 3 million rows, 1 column
# @david's
In [33]: %timeit df[~df['full_time_result'].apply(lambda x: any([True for v in x.values() if v == None]))]
1.59 s ± 82.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @Ch3steR's
In [34]: %%timeit
...: m = df['full_time_result'].map(itemgetter('1', '2')).map((None, None).__eq__)
...: df[~m]
...:
...:
834 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
≈ 2X faster than using lambda
Upvotes: 2
Reputation: 16683
With lambda x:
you are going through each row of the specified column. From there, you can perform normal python operations like any()
and access the values()
of each row's dictionary and check if any are equal to None
. That will return True
, so we want to filter out these True
results with ~
:
df[~df['full_time_result'].apply(lambda x: any([True for v in x.values() if v == None]))]
Upvotes: 0