Sundrah
Sundrah

Reputation: 865

Finding how many of a certain character in each element in a list

I want to find how many ' '(whitespaces) there are in each of these sentences which happen to be elements in a list. So, for: ['this is a sentence', 'this is one more sentence'] Calling element 0 would return a value of 3, and calling element 1 would return a value of 4. I really am having trouble doing both of finding the whitespaces as well as looping through every element to find the one with the highest number of whitespaces.

Upvotes: 0

Views: 215

Answers (3)

mhawke
mhawke

Reputation: 87134

You state "whitespace", normally that would include these characters '\t\n\x0b\x0c\r ', plus any unicode characters, e.g. u'\u3000' (IDEOGRAPHIC SPACE).

A regex solution is one of the better ones, because it easily supports any unicode whitespace codepoint in addition to the usual ascii ones. Just use re.findall() and set the re.UNICODE flag:

import re

def count_whitespace(s):
    return len(re.findall(r'\s', s, re.UNICODE))

l = ['this is a sentence',
     'this is one more sentence',
     '',
     u'\u3000\u2029    abcd\t\tefghi\0xb  \n\r\nj k  l\tm    \n\n',
     'nowhitespaceinthisstring']

for s in l:
    print count_whitespace(s)

Output

3
4
0
23
0

An easy, non-regex, way to do this is with str.split() which naturally splits on any whitespace character and is an effective way of removing all whitespace from a string. This also works with unicode whitespace characters:

def count_whitespace(s):
    return len(s) - len(''.join(s.split()))

for s in l:
    print count_whitespace(s)

Output

3
4
0
23
0

Finally, picking out the sentence with the most whitespace characters:

>>> max((count_whitespace(s), s) for s in l)[1]
u'\u3000\u2029    abcd\t\tefghi\x00xb  \n\r\nj k  l\tm    \n\n'

Upvotes: 1

itzMEonTV
itzMEonTV

Reputation: 20359

You can use Counter.I dont know whether it is time consuming than .count()

from collections import Counter
lst = ['this is a sentence', 'this is one more sentence']
>>>[Counter(i)[' '] for i in lst]
[3, 4]

Upvotes: 1

Bhargav Rao
Bhargav Rao

Reputation: 52171

Have a simple list-coprehension using count

>>> lst = ['this is a sentence', 'this is one more sentence']
>>> [i.count(' ') for i in lst]
[3, 4]

Other ways include using map

>>> map(lambda x:x.count(' '),lst)
[3, 4]

If you want a callable (which is a function that iterates through your list as you have mentioned) it can be implemented as

>>> def countspace(x):
...     return x.count(' ')
... 

and executed as

>>> for i in lst:
...     print countspace(i)
... 
3
4

This can be solved using regexes using the re module as mentioned below by Grijesh

>>> import re
>>> [len(re.findall(r"\s", i)) for i in lst]
[3, 4]

Post edit

As you say you need to find the max element also, you can do

>>> vals = [i.count(' ') for i in lst] 
>>> lst[vals.index(max(vals))]
'this is one more sentence'

This can be implemented as a callable using

>>> def getmax(lst):
...     vals = [i.count(' ') for i in lst]
...     maxel = lst[vals.index(max(vals))]
...     return (vals,maxel)

and use it as

>>> getmax(lst)
([3, 4], 'this is one more sentence')

Post comment edit

>>> s = 'this is a sentence. this is one more sentence'
>>> lst = s.split('. ')
>>> [i.count(' ') for i in lst]
[3, 4]

Upvotes: 3

Related Questions