Reputation: 93
I have a particularly long, nasty string that looks something like this:
nastyString = ' nameOfString1, Inc_(stuff)\n nameOfString2, Inc_(stuff)\n '
and so on. The key defining feature is that each "nameOfString" is followed by a \n
with two spaces after it. The first nameOfString has two spaces in front of it as well.
I'm trying to create a list that would look something like this:
niceList = [nameOfString1, Inc_(stuff), nameOfString2, Inc_(Stuff)]
and so on.
I've tried to use newString = nastyString.split()
as well as newString = nastyString.replace('\n ', '')
, but ultimately, these solutions can't work because each nameOfString has a space after the comma and before the 'I' of Inc. Furthermore, not all the nameOfStrings have an 'Inc,' but most do have some sort of space in their name.
Would really appreciate some guidance or direction on how I could tackle this issue, thanks!
Upvotes: 0
Views: 73
Reputation: 466
if you don't like to replacing '\n'
do this :
import re
nastyString = ' nameOfString1, Inc_(stuff)\n nameOfString2, Inc_(stuff)\n '
word =re.findall(r'.',nastyString)
s=""
for i in word:
s+=i
print s
output :'nameOfString1, Inc_(stuff) nameOfString2, Inc_(stuff) '
now you can use split()
print s.split(',')
Upvotes: 1
Reputation: 14021
nastyString = ' nameOfString1, Inc_(stuff)\n nameOfString2, Inc_(stuff)\n '
# replace '\n' with ','
nastyString = nastyString.replace('\n', ',')
# split at ',' and `strip()` all extra spaces
niceList = [v.strip() for v in nastyString.split(',') if v.strip()]
output:
niceList
['nameOfString1', 'Inc_(stuff)', 'nameOfString2', 'Inc_(stuff)']
Update: OP shared new input:
That's awesome, never knew about the strip function. However, I actually am trying to including the "Inc" section, so I was hoping for output of: ['nameOfString1, Inc_(stuff)', 'nameOfString2, Inc_(stuff)'] and so on, any advice?
nastyString = ' nameOfString1, Inc_(stuff)\n nameOfString2, Inc_(stuff)\n '
niceList = [v.strip() for v in nastyString.split('\n') if v.strip()]
new output:
niceList
['nameOfString1, Inc_(stuff)', 'nameOfString2, Inc_(stuff)']
Upvotes: 1
Reputation: 411
May be you can try something like this.
[word for word in nastyString.replace("\n", "").replace(",", "").strip().split(' ') if word !='']
Output:
['nameOfString1', 'Inc_(stuff)', 'nameOfString2', 'Inc_(stuff)']
Upvotes: 2
Reputation: 71471
You can use regular expressions:
import re
nastyString = ' nameOfString1, Inc_(stuff)\n nameOfString2, Inc_(stuff)\n '
new_string = [i for i in re.split("[\n\s,]", nastyString) if i]
Output:
['nameOfString1', 'Inc_(stuff)', 'nameOfString2', 'Inc_(stuff)']
Upvotes: 1