Zorgmorduk
Zorgmorduk

Reputation: 1365

Count spaces in text (treat consecutive spaces as one)

How would you count the number of spaces or new line charaters in a text in such a way that consecutive spaces are counted only as one? For example, this is very close to what I want:

string = "This is an  example text.\n   But would be good if it worked."
counter = 0
for i in string:
    if i == ' ' or i == '\n':
        counter += 1
print(counter)

However, instead of returning with 15, the result should be only 11.

Upvotes: 4

Views: 9023

Answers (9)

Mykola Zotko
Mykola Zotko

Reputation: 17834

You can use the function groupby() to find groups of consecutive spaces:

from collections import Counter
from itertools import groupby

s = 'This is an  example text.\n   But would be good if it worked.'

c = Counter(k for k, _ in groupby(s, key=lambda x: ' ' if x == '\n' else x))
print(c[' '])
# 11

Upvotes: 1

ahmed boutayeb
ahmed boutayeb

Reputation: 7

Try:

def word_count(my_string):     
    word_count = 1
    for i in range(1, len(my_string)):
        if my_string[i] == " ":

            if not my_string[i - 1] == " ":    
                word_count += 1

         return word_count

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

You can use enumerate, checking the next char is not also whitespace so consecutive whitespace will only count as 1:

string = "This is an  example text.\n   But would be good if it worked."

print(sum(ch.isspace() and not string[i:i+1].isspace() for i, ch in enumerate(string, 1)))

You can also use iter with a generator function, keeping track of the last character and comparing:

def con(s):
    it = iter(s)
    prev = next(it)
    for ele in it:
        yield prev.isspace() and not ele.isspace()
        prev = ele
    yield ele.isspace()

print(sum(con(string)))

An itertools version:

string = "This is an  example text.\n     But would be good if it worked.  "

from itertools import tee, izip_longest

a, b = tee(string)
next(b)
print(sum(a.isspace() and not b.isspace() for a,b in izip_longest(a,b, fillvalue="") ))

Upvotes: 2

th3an0maly
th3an0maly

Reputation: 3510

Assuming you are permitted to use Python regex;

import re
print len(re.findall(ur"[ \n]+", string))

Quick and easy!

UPDATE: Additionally, use [\s] instead of [ \n] to match any whitespace character.

Upvotes: 6

timgeb
timgeb

Reputation: 78750

re to the rescue.

>>> import re
>>> string = "This is an  example text.\n   But would be good if it worked."
>>> spaces = sum(1 for match in re.finditer('\s+', string))
>>> spaces
11

This consumes minimal memory, an alternative solution that builds a temporary list would be

>>> len(re.findall('\s+', string))
11

If you only want to consider space characters and newline characters (as opposed to tabs, for example), use the regex '(\n| )+' instead of '\s+'.

Upvotes: 3

Thomite
Thomite

Reputation: 741

The default str.split() function will treat consecutive runs of spaces as one. So simply split the string, get the size of the resulting list, and subtract one.

len(string.split())-1

Upvotes: 4

illright
illright

Reputation: 4043

You can iterate through numbers to use them as indexes.

for i in range(1, len(string)):
    if string[i] in ' \n' and string[i-1] not in ' \n':
        counter += 1
if string[0] in ' \n':
    counter += 1
print(counter)

Pay attention to the first symbol as this constuction starts from the second symbol to prevent IndexError.

Upvotes: 2

Mohammed Aouf Zouag
Mohammed Aouf Zouag

Reputation: 17142

You can do this:

string = "This is an  example text.\n   But would be good if it worked."
counter = 0
# A boolean flag indicating whether the previous character was a space
previous = False 
for i in string:
    if i == ' ' or i == '\n': 
        # The current character is a space
        previous = True # Setup for the next iteration
    else:
        # The current character is not a space, check if the previous one was
        if previous:
            counter += 1

        previous = False
print(counter)

Upvotes: 3

nhouser9
nhouser9

Reputation: 6780

Just store a character that was the last character found. Set it to i each time you loop. Then within your inner if, do not increase the counter if the last character found was also a whitespace character.

Upvotes: 2

Related Questions