Reputation: 139

removing of unnecessary spaces in text

Could you please shed some light on this?
The spaces are not dealt properly (test C and E) and I don't understand what is wrong.
Thanks a lot.

foo={'testing':['this    is test A','  this is test B',' this is test C ','   this is test D','   this is test E  ']}
foo=pd.DataFrame(foo,columns=['testing']) 
print("Before:")
print(foo,"\n")
foo.replace(r'\s+', ' ', regex=True,inplace=True)
print("After:")
print(foo)

Before:
               testing
0    this    is test A
1       this is test B
2      this is test C 
3       this is test D
4     this is test E   

After:
            testing
0    this is test A
1    this is test B
2   this is test C 
3    this is test D
4   this is test E

Upvotes: 1

Answers (3)

Andrej Kesely

Reputation: 195428

You can do it without regex:

foo["testing"] = foo["testing"].str.split().str.join(" ")
print(foo)

Prints:

          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E

Upvotes: 0

cottontail

Reputation: 23081

# remove leading and trailing space first; then use regex to replace space inside the strings
foo['testing'] = foo['testing'].str.strip().str.replace(r'\s+', ' ', regex=True)
print(foo)
          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E

Upvotes: 3

Adon Bilivit

Reputation: 26976

It's probably easier to process the dictionary before constructing the dataframe. You also need to account for leading space in any of the strings.

import pandas as pd
import re

foo={'testing':['this    is test A','  this is test B',' this is test C ','   this is test D','   this is test E  ']}

foo['testing'] = [re.sub('\s+', ' ', s.strip()) for s in foo['testing']]

foo = pd.DataFrame(foo, columns=['testing'])

print(foo)

Output:

          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E

Upvotes: 3

removing of unnecessary spaces in text

Answers (3)

Related Questions