Python List Filtering Removes Too Many

Question

I python list of urls as strings. I'm trying to remove all strings that have two forward slashes in them (//). Here is how I am attempting to do it:

filtered_list = [x for x in original_list if 'https://www.ourlads.com/ncaa-football-depth-charts/player//' not in x]

However, when I run this, it removes all strings with // and additional strings that don't even include //.

Here is a sample of the original list:

original_list = ['https://www.ourlads.com/ncaa-football-depth-charts/player/devonta-smith/123433',
'https://www.ourlads.com/ncaa-football-depth-charts/player//0',
'https://www.ourlads.com/ncaa-football-depth-charts/player//116922',
'https://www.ourlads.com/ncaa-football-depth-charts/player/alex-leatherwood/123411']

What can I change so it only removes string with // in it?

Carl Kristensen · Accepted Answer

Your code seems to be working. But another way to do this could be via regex.

import re

original_list = ['https://www.ourlads.com/ncaa-football-depth-charts/player/devonta-smith/123433',
'https://www.ourlads.com/ncaa-football-depth-charts/player//0',
'https://www.ourlads.com/ncaa-football-depth-charts/player//116922',
'https://www.ourlads.com/ncaa-football-depth-charts/player/alex-leatherwood/123411']

filtered_list = [x for x in original_list if not re.match(r"^https://.*//", x)]
filtered_list

filter_list:

['https://www.ourlads.com/ncaa-football-depth-charts/player/devonta-smith/123433',
 'https://www.ourlads.com/ncaa-football-depth-charts/player/alex-leatherwood/123411']

Python List Filtering Removes Too Many

Answers (1)

Related Questions