Reputation: 1454
I have a large list of words:
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
I would like to be able to count the number of elements in between (and including) the [tag] elements across the whole list. The goal is to be able to see the frequency distribution.
Can I use range()
to start and stop on a string match?
Upvotes: 2
Views: 7957
Reputation: 3496
Solution using list comprehension and string manipulation.
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
# string together your list
my_str = ','.join(mylist)
# split the giant string by tag, gives you a list of comma-separated strings
my_tags = my_str.split('[tag]')
# split for each word in each tag string
my_words = [w.split(',') for w in my_tags]
# count up each list to get a list of counts for each tag, adding one since the first split removed [tag]
my_cnt = [1+len(w) for w in my_words]
Do it one line:
# all as one list comprehension starting with just the string
[1+len(t.split(',')) for t in my_str.split('[tag]')]
Upvotes: 0
Reputation: 1711
First, find all indices of [tag]
, the diff between adjacent indices is the number of words.
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
indices = [i for i, x in enumerate(my_list) if x == "[tag]"]
nums = []
for i in range(1,len(indices)):
nums.append(indices[i] - indices[i-1])
A faster way to find all indices is using numpy, like shown below:
import numpy as np
values = np.array(my_list)
searchval = '[tag]'
ii = np.where(values == searchval)[0]
print ii
Another way to get diff between adjacent indices is using itertools,
import itertools
diffs = [y-x for x, y in itertools.izip (indices, indices[1:])]
Upvotes: 5
Reputation: 375
Borrowing and slightly modifying the generator code from the selected answer to this question:
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
def group(seq, sep):
g = []
for el in seq:
g.append(el)
if el == sep:
yield g
g = []
counts = [len(x) for x in group(my_list,'[/tag]')]
I changed the generator they gave in that answer to not return the empty list at the end and to include the separator in the list instead of putting it in the next list. Note that this assumes there will always be a matching '[tag]' '[/tag'] pair in that order, and that all the elements in the list are between a pair.
After running this, counts will be [7,5,4]
Upvotes: 0
Reputation: 2670
I would go with the following since the OP wants to count the actual values. (No doubt he has figured out how to do that by now.)
i = [k for k, i in enumerate(my_list) if i == '[tag]']
j = [k for k, p in enumerate(my_list) if p == '[/tag]']
for z in zip(i,j):
print (z[1]-z[0])
Upvotes: 0
Reputation: 22001
This should allow you to find the number of words between and including you tags:
MY_LIST = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]',
'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
def main():
ranges = find_ranges(MY_LIST, '[tag]', '[/tag]')
for index, pair in enumerate(ranges, 1):
print('Range {}: Start = {}, Stop = {}'.format(index, *pair))
start, stop = pair
print(' Size of Range =', stop - start + 1)
def find_ranges(iterable, start, stop):
range_start = None
for index, value in enumerate(iterable):
if value == start:
if range_start is None:
range_start = index
else:
raise ValueError('a start was duplicated before a stop')
elif value == stop:
if range_start is None:
raise ValueError('a stop was seen before a start')
else:
yield range_start, index
range_start = None
if __name__ == '__main__':
main()
This example will print out the following text so you can see how it works:
Range 1: Start = 0, Stop = 6
Size of Range = 7
Range 2: Start = 7, Stop = 11
Size of Range = 5
Range 3: Start = 12, Stop = 15
Size of Range = 4
Upvotes: 0
Reputation: 11585
You can use .index(value, [start, [stop]])
to search through the list.
my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
my_list.index('[tag']) # will return 0, as it occurs at the zero-eth element
my_list.index('[/tag]') # will return 6
That will get you your first group length, then on the next iteration you just need to remember what the last closing tag's index was, and use that as the start point, plus 1
my_list.index('[tag]', 7) # will return 7
my_list.index(['[/tag]'), 7) # will return 11
And do that in a loop till you've reached your last closing tag in your list.
Also remember, that .index
will raise a ValueError if the value is not present, so you'll need to handle that exception when it occurs.
Upvotes: 1