Reputation: 41
I have a text:
text = '''
Wales greatest moment. Lille is so close to the Belgian
border,
this was essentially a home game for one of the tournament favourites. Their
confident supporters mingled with their new Welsh fans on the streets,
buying into the carnival spirit - perhaps more relaxed than some might have
been before a quarter-final because they thought this was their time.
In the driving rain, Wales produced the best performance in their history to
carry the nation into uncharted territory. Nobody could quite believe it.'''
I have a code:
words = text.replace('.',' ').replace(',',' ').replace('\n',' ').split(' ')
print(words)
And Output:
['Wales', 'greatest', 'moment', '', 'Lille', 'is', 'so', 'close', 'to', 'the', 'Belgian', 'border', '', '', 'this', 'was', 'essentially', 'a', 'home', 'game', 'for', 'one', 'of', 'the', 'tournament', 'favourites', '', 'Their', '', 'confident', 'supporters', 'mingled', 'with', 'their', 'new', 'Welsh', 'fans', 'on', 'the', 'streets', '', '', 'buying', 'into', 'the', 'carnival', 'spirit', '-', 'perhaps', 'more', 'relaxed', 'than', 'some', 'might', 'have', '', 'been', 'before', 'a', 'quarter-final', 'because', 'they', 'thought', 'this', 'was', 'their', 'time', '', 'In', 'the', 'driving', 'rain', '', 'Wales', 'produced', 'the', 'best', 'performance', 'in', 'their', 'history', 'to', '', 'carry', 'the', 'nation', 'into', 'uncharted', 'territory', '', 'Nobody', 'could', 'quite', 'believe', 'it', '']
You can see, list have empty spaces, I remove '\n'
, ','
and '.'
.
But now I have not idea how to remove this spaces.
Upvotes: 2
Views: 146
Reputation: 304
EDIT:
The original answer does not product the same output as mentioned in the comments, because of the dash symbol, to avoid that:
import re
words = re.findall(r'[\w-]+', text)
Original Answer
You can directly get what you want with the re
module
import re
words = re.findall(r'\w+', text)
['Wales',
'greatest',
'moment',
'Lille',
'is',
'so',
'close',
'to',
'the',
'Belgian',
'border',
'this',
'was',
'essentially',
'a',
'home',
'game',
'for',
'one',
'of',
'the',
'tournament',
'favourites',
'Their',
'confident',
'supporters',
'mingled',
'with',
'their',
'new',
'Welsh',
'fans',
'on',
'the',
'streets',
'buying',
'into',
'the',
'carnival',
'spirit',
'perhaps',
'more',
'relaxed',
'than',
'some',
'might',
'have',
'been',
'before',
'a',
'quarter',
'final',
'because',
'they',
'thought',
'this',
'was',
'their',
'time',
'In',
'the',
'driving',
'rain',
'Wales',
'produced',
'the',
'best',
'performance',
'in',
'their',
'history',
'to',
'carry',
'the',
'nation',
'into',
'uncharted',
'territory',
'Nobody',
'could',
'quite',
'believe',
'it']
Upvotes: 2
Reputation: 59238
You can filter them, if you don't like them
no_empties = list(filter(None, words))
If function is
None
, the identity function is assumed, that is, all elements of iterable that are false are removed.
This works because empty elements are considered falsey.
Upvotes: 3
Reputation: 170
The reason you are getting this issue is that your text value is indented in every line with 4 single spaces, not because your code is flawed. You could add .replace(' ','')
to your 'words' logic to fix this if you mean to have 4 single spaces every line, or you could refer to Thomas Weller's solution, which will solve the problem no matter how many consecutive single spaces you leave
Upvotes: 1