Splitting strings at multiple delimiters in python

Question

I have a file text.txt which contains the ff:

apple boy 'cat'
dog, egg fin
goat hat ice!

I need to split the text file using spaces and special characters and while also ignoring new lines so that the output will be an array like this:

["apple", "boy", "'", "cat", "'", "dog", "egg", "fin", "goat", "hat", "ice", "!"]

but so far the output of my code results to something like this: it returns the string per character and even retains the spaces...

["a", "p", "p", "l", "e", "b", "o", "y", "'", "c", "a", "t", "'", "
," "d", "o", "g", "e", "g", "g", "f", "i", "n", "
", "g", "o", "a", "t", "h", "a", "t", "i", "c", "e", "!", "
" ]

Here is my code:

file=open(text.txt)

for i in file:
        i.split(" ")
        b+=i


print b

what to do if importing of any modules is not allowed? especially the re module?

Padraic Cunningham · Accepted Answer

Use a temp string, find the non-alphanumeric characters wrapping them in spaces both side then split at the end

lines ="""apple boy 'cat'
dog, egg fin
goat hat ice!"""

out = []
for line in lines.splitlines():
    temp = ""
    for ch in line:
        if ch.isalnum():
            temp+= ch
        else:
            temp += " {} ".format(ch)
    out.extend(temp.split())
print(out)

Output:

['apple', 'boy', "'", 'cat', "'", 'dog', ',', 'egg', 'fin', 'goat', 'hat', 'ice', '!']

Using your file is just a matter of iterating over the file object and applying the same logic:

with open("text.txt") as f:
    out = []
    for line in f:
        temp = ""
        for ch in line:
            if ch.isalnum():
                temp += ch
            else:
                temp += " {} ".format(ch)
        out.extend(temp.split())

You could also use a set of punctuation chars and change the logic checking if a char appears in the set or not:

st = set("""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~""")
with open("text.txt") as f:
    out = []
    for line in f:
        temp = ""
        for ch in line:
            if ch not in st:
                temp += ch
            else:
                temp += " {} ".format(ch)
        out.extend(temp.split())

Splitting strings at multiple delimiters in python

Answers (2)

Related Questions