Reputation: 77

specifying start and end positions from python list

I have a python list that is made up of positions and scores.

I need to find a way to write a code that will specify start and end positions of regions with scores over a certain cutoff value.

Any ideas as to how to filter through the list and find these regions?

Upvotes: 1

Answers (5)

Tim McNamara

Reputation: 18385

You'll want to convert the strings to floating point numbers, then compare them against 0.6 and filter those against .

There are few ways to do this in Python, with the last generally being the most "Pythonic"

To start, the method that is probably easiest to understand for a new programmer is iteration. Start with an empty list, then append members to that list that past your test.

>>> tmp = []
>>> for item in lst:
...     _discard, test = item.split()
...     test = float(test)
...     if test > 0.6:
...         tmp.append(item)
>>> tmp
['101  0.7', '102  0.8', '103  0.7', '105  0.7', '106  0.8']

Another approach that you may encounter uses a few builtin function called filter. Filter accepts a function to be called against every item in its second argument, an iterable. Item that return True are moved into a new list. Items which fail are dropped.

To make this work, we use an anonymous function with the lambda syntax for our test function. This makes it a little bit harder to interpret if you're not familiar with the syntax. More experienced programmers will typically prefer this method to the first though, due to it being quite concise and quite clear - the term filter makes it clear what you're intending to do.

>>> filter(lambda item: float(item.split()[1]) > 0.6, lst)
['101  0.7', '102  0.8', '103  0.7', '105  0.7', '106  0.8']

The last – and probably most common approach these days – is to use what's known as a list comprehension. In this approach, you bundle everything inside a single line that doesn't need function calls. It's very fast, but can be a bit confusing for newcomers..

>>> [item for item in lst if float(item.split()[1]) >= 0.6]
['101  0.7', '102  0.8', '103  0.7', '105  0.7', '106  0.8']

Upvotes: 0

Mihai Hangiu

Reputation: 598

Try this one:

lst = ['100  0.0', '101  0.7', '102  0.8', '103  0.7', '104  0.0', '105  0.7', '106  0.8', '107  0.0']
start = False
results = []
prevEndPos = -1
for e in lst :
    elems = e.split()
    pos = int(elems[0])
    score = float(elems[1])
    print pos, score
    if score >= 0.6 :
        if start == False :
            start = True
            startPos = pos
        prevEndPos = pos
    else :
        if start :
            start = False
            endPos = prevEndPos
            results.append((startPos, endPos))

print results

Pay attention and do not name variables 'list', list is a type in Python and even if the code will work, it will hide the Python name.

You may save the results in a list of tuples (as above) or in a dict of tuples, or list of lists, or dict of lists, either way it works.

Output:

[(101, 103), (105, 106)]

The output means: first region start at 101 ends at 103 second region start at 105 ends at 106.

Upvotes: 0

Iron Fist

Reputation: 10951

You can also use the built-in method filter , this way:

>>> filter(lambda s: float(s.split()[-1])>=0.6, list)
['101  0.7', '102  0.8', '103  0.7', '105  0.7', '106  0.8']

Upvotes: 0

Sait

Reputation: 19825

I want to find regions where the score is .6 or greater

In [14]: [ int(l.split()[0]) for l in list if float(l.split()[1])>0.6 ]
Out[14]: [101, 102, 103, 105, 106]

Upvotes: 1

Avinash Raj

Reputation: 174816

Print all the elements only if the second number is greater than or equal to 0.6 .

>>> lst = ['100  0.0', '101  0.7', '102  0.8', '103  0.7', '104  0.0', '105  0.7', '106  0.8', '107  0.0']
>>> [i for i in lst if float(i.split()[1]) >= 0.6]
['101  0.7', '102  0.8', '103  0.7', '105  0.7', '106  0.8']

Upvotes: 1

specifying start and end positions from python list

Answers (5)

Related Questions