Reputation: 141
Hi everyone I am pretty new to python and wanted some help. I have some sample data and wanted to know how would I get rid of spacing for each string within each list.
data = [
['In dolore .'], ['Voluptatum. '],
['Veniam hic non minima. '],
['Dolores Quis enim'],
[' sequi ducimus']
]
print data
The Output I desire:
data = [
['In dolore.'], ['Voluptatum.'],
['Veniam hic non minima.'],
['Dolores Quis enim'],
['sequi ducimus']
]
Here are the two ways I thought would work but didn't
for i in data:
str = ''.join(data)
final_data = str.replace(" ","")
print final_data
my final attempt was this:
final_data = ''.join(data)
final_data.replace(" ", "")
print final_data
Upvotes: 1
Views: 98
Reputation: 4634
You can use regex here
import re
for i in range(len(data)):
data[i][0] = re.sub(r'(\w)\s\s+(\w)', r'\1 \2', data[i][0])
data[i][0] = re.sub(r'\s\s+', r'', data[i][0])
data[i][0] = re.sub(r"(\w)\s([.])", r"\1\2", dat[i][0])
The regex pattern \s\s+
matches all groups of 2 or more whitespace characters. As noted that would eliminate spacing between words if there was more than 1 space between two words. The r'\w\s\s+\w', r'\1 \2'
takes care of that by eliminating the spaces between word boundaries with a single space.
Also note that it's data[i][0]
because strangely the data is a list of lists.
Upvotes: 5
Reputation: 1871
import re
final_data = [[re.sub('\s+\.', '.', re.sub('\s+', ' ', s)).strip()] for l in data for s in l]
print final_data
[['In dolore.'], ['Voluptatum.'], ['Veniam hic non minima.'], ['Dolores Quis enim'], ['sequi ducimus']]
This way leading and trailing whitespaces are removed, while internal spaces are not eliminated entirely.
The inner substitution replaces multiple whitespace characters with a single spaces, and the outer substitution eliminates whitespace characters before a period. The strip
function eliminates leading and trailing whitespace.
Upvotes: 2