Reputation: 6176
I'm a bit confused about a line in Python:
We use Python and a custom function to split a line: we want what is between quotes to be a single entry in the array.
The line is, for example:
"La Jolla Bank, FSB",La Jolla,CA,32423,19-Feb-10,24-Feb-10
So "La Jolla Bank, FSB" should be a single entry in the array.
And I'm not sure to understand this code:
The first char is a quote '"
', so the variable "quote" is set to its inverse, so set to "TRUE".
Then we check the comma, AND if quote is set to its inverse, so if quote is TRUE, which is the case when we are inside the quotes.
We cut it with current=""
, and this is where I don't understand: we are still between the quotes, so normally we should not cut it now! edit: so and not quote means "false", and not "the opposite of", thanks !
Code:
def mysplit (string):
quote = False
retval = []
current = ""
for char in string:
if char == '"':
quote = not quote
elif char == ',' and not quote: #the first coma is still in the quotes, and quote is set to TRUE, so we should not cut current here...
retval.append(current)
current = ""
else:
current += char
retval.append(current)
return retval
Upvotes: 0
Views: 174
Reputation: 7026
OK you aren't quite there!
1.the first char is a quote ' " ', so the variable "quote" is set to its inverse, so set to "TRUE".
good! so quote was set to the inverse of whatever it was previously. At the beginning of the prog, it was false, so when "
is seen, it becomes true. But vice versa, if it was True, and a quote is seen, it becomes false.
In other words, this line of the program changes quote
from whatever is was before that line. It is called 'toggling'.
- then we check the coma, AND if quote is set to its inverse, so if quote is TRUE, which is the case when we are inside the quotes.
This isn't quite right. not quote
means "only if quote is false". This has nothing to do with whether it is 'set to its inverse'. No variable can be equal to its own inverse! it is like saying X=True and X=False
- obviously nonsense.
quote
is always either True
or False
- and nothing else!
3.we cut it with current="", and this is where i don't understand : we are still between the quotes, so norm ally we should not cut it now!
So hopefully you can see now that, you are not between the quotes if you reach this line. the not quote
ensures that you don't cut inside a quote, because not quote
really means just that - not in a quote!
Upvotes: 1
Reputation: 9481
This is not the best code for splitting but it is pretty straight forward
1 current = ""
# First you set current to empty string, the following line
# will loop through the string to be split and pull characters out of it
# one by one... setting 'char' to be the value of next character
2 for char in string:
# the following code will check if the line we are currently inside of the quote
# if otherwise it will add the current character to the the 'current' variable
#
3 if char == '"':
4 quote = not quote
5 elif char == ',' and not quote:
6 retval.append(current)
### if we see the comma, it will append whatever is accumulated in current to the
### return result.
### then you have to reset the value in the current to let the next word accumulate
7 current = "" #why do we cut current here?
8 else:
9 current += char
### after the last char is seen, we still have left over characters in current which
### we can just shove into the final result
10 retval.append(current)
11 return retval
Here is an example run:
Let string be 'a,bbb,ccc
Step char current retval
1 a a {}
2 , {a} ### Current is reset
3 b b {a}
4 b bb {a}
5 b bbb {a}
6 , {a,bbb} ### Current is reset
and so on
Upvotes: 1
Reputation: 59397
What you're looking at is a condensed version of a Finite State Machine, used by most language parsing programs.
Let's see if I can't annotate it:
def mysplit (string):
# We start out at the beginning of the string NOT in between quotes
quote = False
# Hold each element that we split out
retval = []
# This variable holds whatever the current item we're interested in is
# e.g: If we're in a quote, then it's everything (including commas)
# otherwise it's every UP UNTIL the next comma
current = ""
# Scan the string character by character
for char in string:
# We hit a quote, so turn on QUOTE SCANNING MODE!!!
# If we're in quote scanning mode, turn it off
if char == '"':
quote = not quote
# We hit a comma, and we're not in quote scanning mode
elif char == ',' and not quote:
# We got what we want, let's put it in the return value
# and then reset our current item to nothing so we can prepare for the next item.
retval.append(current)
current = ""
else:
# Nothing special, let's just keep building up our current item
current += char
# We're done with all the characters, let's put together whatever we were working on when we ran out of characters
retval.append(current)
# Return it!
return retval
Upvotes: 1
Reputation: 4346
current is reset to empty because in the case where you have encountered ',' and you are not under "" quotes you should interpret that as an end of a "token".
This is definitely not pythonic, for char in string
makes me cringe and whoever wrote this code should have used regex.
Upvotes: 1
Reputation: 40789
You're viewing it as though both if char == '"'
and elif char == ',' and not quote
were run.
However the if statement explicitly makes it so that only one will run.
Either, quote will be inverted OR the current
value will get cut.
In the case where the current char is "
, then the logic will be called to invert the quote
flag. But the logic to cut the string will not run.
In the case where the current char is ,
, then the logic for inverting the flag will NOT run, but the logic to cut the string will if the quote
flag is not set.
Upvotes: 3
Reputation: 12339
That is initializing current
to the empty string, wiping out whatever it may have been set to before.
As long as you are not inside quotes (ie. quote
is False), when you see a ,
, you have hit the end of the field. Whatever you have accumulated into current
is the content of that field, so append it to retval
and reset current
to the empty string, ready for the next field.
That said, this looks like you're dealing with a .csv input. There is a csv module that can deal with this for you.
Upvotes: 1