Hulk
Hulk

Reputation: 34160

Split a string by a delimiter in Python

Consider the following input string:

'MATCHES__STRING'

I want to split that string wherever the "delimiter" __ occurs. This should output a list of strings:

['MATCHES', 'STRING']

To split on whitespace, see How do I split a string into a list of words?.
To extract everything before the first delimiter, see Splitting on first occurrence.
To extract everything before the last delimiter, see Partition string in Python and get value of last segment after colon.

Upvotes: 268

Views: 543406

Answers (6)

adamk
adamk

Reputation: 46804

Use the str.split method:

>>> "MATCHES__STRING".split("__")
['MATCHES', 'STRING']

Upvotes: 416

Gnai
Gnai

Reputation: 37

For Python 3.8, you actually don't need the get_text method, you can just go with ev.split("@"), as a matter of fact the get_text method is throwing an AttributeError. So if you have a string variable, for example:

filename = 'file/foo/bar/fox'

You can just split that into different variables with comas as suggested in the above comment but with a correction:

W, X, Y, Z = filename.split('_') 
W = 'file' 
X = 'foo'
Y = 'bar'
Z = 'fox'

Upvotes: 2

cottontail
cottontail

Reputation: 23051

When you want to split a string by a specific delimiter like: __ or | or , etc. it's much easier and faster to split using .split() method as in the top answer because Python string methods are intuitive and optimized. However, if you need to split a string using a pattern (e.g. " __ " and "__"), then using the built-in re module might be useful.

For the example in the OP:

import re

s1 = "MATCHES__STRING"
s2 = "MATCHES __ STRING"

re.split(r"\s*__\s*", s1)   # ['MATCHES', 'STRING']
re.split(r"\s*__\s*", s2)   # ['MATCHES', 'STRING']

\s* matches 0 or more white space characters, i.e. it matches any white space if there is any, so the pattern above matches both __ and __.

If you need to split a list of strings, then compiling the pattern first would be more efficient.

texts = ["a __ b", "c__d__e", "f  __ g"]
pattern = re.compile(r"\s*__\s*")
[pattern.split(s) for s in texts]  
# [['a', 'b'], ['c', 'd', 'e'], ['f', 'g']]

Upvotes: 0

Katriel
Katriel

Reputation: 123622

You may be interested in the csv module, which is designed for comma-separated files but can be easily modified to use a custom delimiter.

import csv
csv.register_dialect( "myDialect", delimiter = "__", <other-options> )
lines = [ "MATCHES__STRING", "MATCHES __ STRING" ]

for row in csv.reader( lines ):
    ...

Upvotes: 4

topin89
topin89

Reputation: 401

Besides split and rsplit, there is partition/rpartition. It separates string once, but the way question was asked, it may apply as well.

Example:

>>> "MATCHES__STRING".partition("__")
('MATCHES', '__', 'STRING')

>>> "MATCHES__STRING".partition("__")[::2]
('MATCHES', 'STRING')

And a bit faster then split("_",1):

$ python -m timeit "'validate_field_name'.split('_', 1)[-1]"
2000000 loops, best of 5: 136 nsec per loop

$ python -m timeit "'validate_field_name'.partition('_')[-1]"
2000000 loops, best of 5: 108 nsec per loop

Timeit lines are based on this answer

Upvotes: 5

Sergey Nasonov
Sergey Nasonov

Reputation: 153

When you have two or more elements in the string (in the example below there are three), then you can use a comma to separate these items:

date, time, event_name = ev.get_text(separator='@').split("@")

After this line of code, the three variables will have values from three parts of the variable ev.

So, if the variable ev contains this string and we apply separator @:

Sa., 23. März@19:00@Klavier + Orchester: SPEZIAL

Then, after the split operation the variable

  • date will have value Sa., 23. März
  • time will have value 19:00
  • event_name will have value Klavier + Orchester: SPEZIAL

Upvotes: 2

Related Questions