Reputation: 33
I'm new to this site, and new to Python.
So I'm learning about Regular Expressions and I was working through Google's expamples here.
I was doing one of the 'Search' examples but I changed the 'Search' to 'Split' and changed the search pattern a bit just to play with it, here's the line
print re.split(r'i', 'piiig')
(notice that there are 3 'i's in the text 'piiig')
The output only has 2 spaces where it's been split.
['p', '', '', 'gs']
Just wondering why this gives that output. This isn't a real life problem and has no relevance but I'm thinking I could run into this later on and want to know what's going on.
Anybody know what's going on???
Upvotes: 3
Views: 413
Reputation: 25677
Think of it this way ... (in Java as I am not so good in python)
String Text = "piiig";
List<String> Spliteds = new ArrayList<String>();
String Match = "";
int I;
char c;
for (I = 0; I < Text.length; I++) {
c = Text.charAt(I);
if (c == 'i') {
Spliteds.add(Match);
Match = "";
} else {
Match += c;
}
}
if (Match.length != 0)
Spliteds.add(Match);
So when you run ...
At the end of each loop:
When: (I == 0) => c = 'p'; Match = "p"; Spliteds = {};
When: (I == 1) => c = 'i'; Match = ""; Spliteds = {"p"};
When: (I == 2) => c = 'i'; Match = ""; Spliteds = {"p", ""};
When: (I == 3) => c = 'i'; Match = ""; Spliteds = {"p", "", ""};
When: (I == 4) => c = 'g'; Match = "g"; Spliteds = {"p", "", ""};
At the end of the program:
(I == 4) => c = 'g'; Match = "g"; Spliteds = {"p", "", "", "g"};
The RegEx engine just simple find string between each 'i
' and this include empty string between 'i' right after another 'i'.
Hope this helps.
Upvotes: 0
Reputation: 138097
split
removes the instance it finds. The two blank strings are are the two empty strings between the i
s.
If you join
ed the array back together using i
as a separator, you'd get the original string back.
piiig
, in that respect is p-
i
-
i
-
i
-g
(here I'm using a dash for the empty string)
Upvotes: 2
Reputation: 993931
Your example might make more sense if you replace i
with ,
:
print re.split(r',', 'p,,,g')
In this case, there are four fields found by splitting on the comma, a 'p'
, a 'g'
, and two empty ones ''
in the middle.
Upvotes: 6