Reputation: 3
I have a .txt file formatted as shown below. Is there a handy way to read in the data and only use "real" \s+ as separators? Meaning that single spaces are not read as separators, but multiple spaces are. By now pandas is creating a separate column for every string, resulting in 4 columns instead of 3.
Thanks for any help or idea!
Hello World 3 2
Banana Pancakes 4 2
Upvotes: 0
Views: 47
Reputation: 41
Building on Hari's answer above, you can use re.split() with his suggested regex pattern:
>import re
>line = "Hello World 3 2"
>pat = re.compile(r'\s\s+')
>pat.split(line)
['Hello World', '3', '2']
Upvotes: 1
Reputation: 6935
Try this :
s = 'Hello World 3 2'
import re
list_ = re.split(r'\s{2,}', s)
OUTPUT :
['Hello World', '3', '2']
Upvotes: 1
Reputation: 482
I suggest you use the regex “\s\s+” as a separator.
This separator works on cases of multiple, but not single, spaces.
Upvotes: 1