Reputation: 139
Could you please shed some light on this?
The spaces are not dealt properly (test C and E) and I don't understand what is wrong.
Thanks a lot.
foo={'testing':['this is test A',' this is test B',' this is test C ',' this is test D',' this is test E ']}
foo=pd.DataFrame(foo,columns=['testing'])
print("Before:")
print(foo,"\n")
foo.replace(r'\s+', ' ', regex=True,inplace=True)
print("After:")
print(foo)
Before:
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
After:
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
Upvotes: 1
Views: 98
Reputation: 195428
You can do it without regex:
foo["testing"] = foo["testing"].str.split().str.join(" ")
print(foo)
Prints:
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
Upvotes: 0
Reputation: 23081
# remove leading and trailing space first; then use regex to replace space inside the strings
foo['testing'] = foo['testing'].str.strip().str.replace(r'\s+', ' ', regex=True)
print(foo)
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
Upvotes: 3
Reputation: 26976
It's probably easier to process the dictionary before constructing the dataframe. You also need to account for leading space in any of the strings.
import pandas as pd
import re
foo={'testing':['this is test A',' this is test B',' this is test C ',' this is test D',' this is test E ']}
foo['testing'] = [re.sub('\s+', ' ', s.strip()) for s in foo['testing']]
foo = pd.DataFrame(foo, columns=['testing'])
print(foo)
Output:
testing
0 this is test A
1 this is test B
2 this is test C
3 this is test D
4 this is test E
Upvotes: 3