archienorman
archienorman

Reputation: 1454

Python - Count elements of a list within a range of specified values

I have a large list of words:

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']

I would like to be able to count the number of elements in between (and including) the [tag] elements across the whole list. The goal is to be able to see the frequency distribution.

Can I use range() to start and stop on a string match?

Upvotes: 2

Views: 7957

Answers (6)

postelrich
postelrich

Reputation: 3496

Solution using list comprehension and string manipulation.

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']

# string together your list
my_str = ','.join(mylist)

# split the giant string by tag, gives you a list of comma-separated strings
my_tags = my_str.split('[tag]')

# split for each word in each tag string
my_words = [w.split(',') for w in my_tags]

# count up each list to get a list of counts for each tag, adding one since the first split removed [tag]
my_cnt = [1+len(w) for w in my_words]

Do it one line:

# all as one list comprehension starting with just the string
[1+len(t.split(',')) for t in my_str.split('[tag]')]

Upvotes: 0

Hooting
Hooting

Reputation: 1711

First, find all indices of [tag], the diff between adjacent indices is the number of words.

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
indices = [i for i, x in enumerate(my_list) if x == "[tag]"]
nums = []
for i in range(1,len(indices)):
    nums.append(indices[i] - indices[i-1])

A faster way to find all indices is using numpy, like shown below:

import numpy as np
values = np.array(my_list)
searchval = '[tag]'
ii = np.where(values == searchval)[0]
print ii

Another way to get diff between adjacent indices is using itertools,

import itertools
diffs = [y-x for x, y in itertools.izip (indices, indices[1:])]

Upvotes: 5

Tofystedeth
Tofystedeth

Reputation: 375

Borrowing and slightly modifying the generator code from the selected answer to this question:

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']

def group(seq, sep):
    g = []
    for el in seq:
        g.append(el)
        if el == sep:
            yield g
            g = []

counts = [len(x) for x in group(my_list,'[/tag]')]

I changed the generator they gave in that answer to not return the empty list at the end and to include the separator in the list instead of putting it in the next list. Note that this assumes there will always be a matching '[tag]' '[/tag'] pair in that order, and that all the elements in the list are between a pair.

After running this, counts will be [7,5,4]

Upvotes: 0

ajsp
ajsp

Reputation: 2670

I would go with the following since the OP wants to count the actual values. (No doubt he has figured out how to do that by now.)

i = [k for k, i in enumerate(my_list) if i == '[tag]']
j = [k for k, p in enumerate(my_list) if p == '[/tag]']
for z in zip(i,j):
    print (z[1]-z[0])

Upvotes: 0

Noctis Skytower
Noctis Skytower

Reputation: 22001

This should allow you to find the number of words between and including you tags:

MY_LIST = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]',
           'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']


def main():
    ranges = find_ranges(MY_LIST, '[tag]', '[/tag]')
    for index, pair in enumerate(ranges, 1):
        print('Range {}: Start = {}, Stop = {}'.format(index, *pair))
        start, stop = pair
        print('         Size of Range =', stop - start + 1)


def find_ranges(iterable, start, stop):
    range_start = None
    for index, value in enumerate(iterable):
        if value == start:
            if range_start is None:
                range_start = index
            else:
                raise ValueError('a start was duplicated before a stop')
        elif value == stop:
            if range_start is None:
                raise ValueError('a stop was seen before a start')
            else:
                yield range_start, index
                range_start = None

if __name__ == '__main__':
    main()

This example will print out the following text so you can see how it works:

Range 1: Start = 0, Stop = 6
         Size of Range = 7
Range 2: Start = 7, Stop = 11
         Size of Range = 5
Range 3: Start = 12, Stop = 15
         Size of Range = 4

Upvotes: 0

Christian Witts
Christian Witts

Reputation: 11585

You can use .index(value, [start, [stop]]) to search through the list.

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
my_list.index('[tag'])   # will return 0, as it occurs at the zero-eth element
my_list.index('[/tag]')  # will return 6

That will get you your first group length, then on the next iteration you just need to remember what the last closing tag's index was, and use that as the start point, plus 1

my_list.index('[tag]', 7)     # will return 7
my_list.index(['[/tag]'), 7)  # will return 11

And do that in a loop till you've reached your last closing tag in your list. Also remember, that .index will raise a ValueError if the value is not present, so you'll need to handle that exception when it occurs.

Upvotes: 1

Related Questions