Ryan Stille
Ryan Stille

Reputation: 1364

How could I split this string in python

I'm learning Python (3). I would like to split each of these lines into four separate pieces.

[Test Branch]             bobjones       0         6/13/2008 4:24 PM
[Todd's Workspace]        tfatcher       0         6/16/2008 9:20 AM
[Henry]                   hmckinkley     1         6/17/2008 10:12 AM
[Henry]                   hmckinkley     0         6/17/2008 10:15 AM

I could call line.split() on each one, but then I'd put the date back together. And I guess the spaces in the first [ ] section rules that out also. I suppose I could slice it, but I'm not 100% sure this data is as fixed width as it seems. A regex is probably best, eh? Any pointers on that?

Update: I thought @Selcuk's solution this was going to work great:

branch,user,version,timestamp = [commitheaderline.split("]", 1)[0] + "]"] + commitheaderline.split("]", 1)[1].split(None, 2)

But then I encountered some data where the username was too long (example below), so the rest of the data ended up on a new line. So I'm working on that now. I'm thinking I test the line somehow before running the split(), and if it doesn't look like a "proper" line I'll join it with the next one.

[Test Branch]             bobjones       0         6/13/2008 4:24 PM
[Todd's Workspace]        tfatcher       0         6/16/2008 9:20 AM
[cole]                    bob.darknsdale
                                         0        7/27/2012 12:49 PM

Upvotes: 2

Views: 219

Answers (3)

rkirmizi
rkirmizi

Reputation: 364

In [4]: import re

In [5]: print text

[Test Branch]             bobjones       0         6/13/2008 4:24 PM
[Todd's Workspace]        tfatcher       0         6/16/2008 9:20 AM
[Henry]                   hmckinkley     1         6/17/2008 10:12 AM
[Henry]                   hmckinkley     0         6/17/2008 10:15 AM


In [6]: pattern = re.compile(r'(\[.*?\])\s+(\w+)\s+(\d+)\s+(.*?$)', re.M)


In [7]: for match in pattern.finditer(text):
   ...:     #do whatever you want here. cols are grouped
   ...:     print "first col: %s - 4th col: %s" %(match.group(1), match.group(4))
   ...:
   ...:
first col: [Test Branch] - 4th col: 6/13/2008 4:24 PM
first col: [Todd's Workspace] - 4th col: 6/16/2008 9:20 AM
first col: [Henry] - 4th col: 6/17/2008 10:12 AM
first col: [Henry] - 4th col: 6/17/2008 10:15 AM

This will work with any size of spaces or tabs.

Upvotes: 0

sakurashinken
sakurashinken

Reputation: 4080

Try

import re
sep = re.split(" {2,}", str)

This will work if the strings are separated by more than one whitespace. If they are tab delineated try

import re
sep = re.split("\t+", str)

Upvotes: 2

Selcuk
Selcuk

Reputation: 59184

You could do the following:

[line.split("]", 1)[0] + "]"] + line.split("]", 1)[1].split(None, 2)

which will result in

['[Test Branch]', 'bobjones', '0', '6/13/2008 4:24 PM']

Upvotes: 0

Related Questions