gracenz
gracenz

Reputation: 139

removing of unnecessary spaces in text


Could you please shed some light on this?
The spaces are not dealt properly (test C and E) and I don't understand what is wrong.
Thanks a lot.

foo={'testing':['this    is test A','  this is test B',' this is test C ','   this is test D','   this is test E  ']}
foo=pd.DataFrame(foo,columns=['testing']) 
print("Before:")
print(foo,"\n")
foo.replace(r'\s+', ' ', regex=True,inplace=True)
print("After:")
print(foo)

Before:
               testing
0    this    is test A
1       this is test B
2      this is test C 
3       this is test D
4     this is test E   

After:
            testing
0    this is test A
1    this is test B
2   this is test C 
3    this is test D
4   this is test E 

Upvotes: 1

Views: 98

Answers (3)

Andrej Kesely
Andrej Kesely

Reputation: 195428

You can do it without regex:

foo["testing"] = foo["testing"].str.split().str.join(" ")
print(foo)

Prints:

          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E

Upvotes: 0

cottontail
cottontail

Reputation: 23081

# remove leading and trailing space first; then use regex to replace space inside the strings
foo['testing'] = foo['testing'].str.strip().str.replace(r'\s+', ' ', regex=True)
print(foo)
          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E

Upvotes: 3

Adon Bilivit
Adon Bilivit

Reputation: 26976

It's probably easier to process the dictionary before constructing the dataframe. You also need to account for leading space in any of the strings.

import pandas as pd
import re

foo={'testing':['this    is test A','  this is test B',' this is test C ','   this is test D','   this is test E  ']}

foo['testing'] = [re.sub('\s+', ' ', s.strip()) for s in foo['testing']]

foo = pd.DataFrame(foo, columns=['testing'])

print(foo)

Output:

          testing
0  this is test A
1  this is test B
2  this is test C
3  this is test D
4  this is test E

Upvotes: 3

Related Questions