bunker
bunker

Reputation: 99

Split a string on multiple characters - Python

I have a string :

V51M229D180728T132714_ACCEPT_EC_NC

This needs to be split into

String 1 : V51 (Can be variable but always ends before M)
String 2 : M22 (Can be variable but always ends before D)
String 3 : D180728 (Date in YYMMDD format)
String 4 : 132714 (Timestamp in HHMMSS format)
String 5 : ACCEPT (Occurs between "_")
String 6 : EC (Occurs between "_")
String 7 : NC (Occurs between "_")

I am new to python and hoping to get some help with this.

Thanks.

Upvotes: 1

Views: 546

Answers (5)

gilch
gilch

Reputation: 11641

You probably want to use a regex with matching groups. See the re module.

For example,

>>> mystr = 'V51M229D180728T132714_ACCEPT_EC_NC'
>>> re.match('(.*?)(M.*?)(D.*?)T(.*?)_(.*?)_(.*?)_(.*?)', mystr).groups()
('V51', 'M229', 'D180728', '132714', 'ACCEPT', 'EC', 'NC')

In the pattern, the () indicate a group, and .*? will match the minimal number of characters to make the pattern fit.

Upvotes: 0

Ramineni Ravi Teja
Ramineni Ravi Teja

Reputation: 3906

If your data is of fixed pattern just sting slicing and list slicing works.

  aa = "V51M229D180728T132714_ACCEPT_EC_NC"                                          
  a = aa.split("_")                                                                 
  str1 = a[0][0:3]                                                                  
  str2 = a[0][3:6]                                                                  
  str3 = a[0][7:14]                                                                 
  str4 = a[0][15:21]                                                                
  str5 = a[1]                                                                       
  str6 = a[2]                                                                     
  str7 = a[3]                                
  print(str1,str2,str3,str4,str5,str6,str7)

Output

V51 M22 D180728 132714 ACCEPT EC NC

Upvotes: 1

user8105524
user8105524

Reputation:

As mxmt said, use regular expressions. Here is another equivalent regex, which might be a little easier to read:

import re

s = 'V51M229D180728T132714_ACCEPT_EC_NC'

pattern = re.compile(r'''
    ^        # beginning of string
    (V\w+)   # first pattern, starting with V
    (M\w+)   # second pattern, starting with M
    (D\d{6}) # third string pattern, six digits starting with D
    T(\d{6}) # time, six digits after T
    _(\w+)
    _(\w+)
    _(\w+)   # final three patterns
    $        # end of string
    ''', re.VERBOSE
)

re.match(pattern, s).groups() -> ('V51', 'M229', 'D180728', '132714', 'ACCEPT', 'EC', 'NC')

Upvotes: 0

Maksim Terpilowski
Maksim Terpilowski

Reputation: 841

Use re module:

import re
a = 'V51M229D180728T132714_ACCEPT_EC_NCM'
re.search('(\w+)(M\w+)(D\d+)(T\d+)_(\w+)_(\w+)_(\w+)', a).groups()

You will get:

('V51', 'M229', 'D180728', 'T132714', 'ACCEPT', 'EC', 'NCM')

Upvotes: 1

Pantone877
Pantone877

Reputation: 584

Use split(). From docs:

str.split(sep=None, maxsplit=-1)

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).

So you can use split('M', 1) to get the list of ['V51', '229D180728T132714_ACCEPT_EC_NC'], then split the second entry of the list with 'D' delimiter to get ['229', '180728T132714_ACCEPT_EC_NC']...

Hope you get the idea.

Upvotes: 0

Related Questions