Reputation: 77
I have a python list that is made up of positions and scores.
I need to find a way to write a code that will specify start and end positions of regions with scores over a certain cutoff value.
Any ideas as to how to filter through the list and find these regions?
Upvotes: 1
Views: 658
Reputation: 18385
You'll want to convert the strings to floating point numbers, then compare them against 0.6 and filter those against .
There are few ways to do this in Python, with the last generally being the most "Pythonic"
To start, the method that is probably easiest to understand for a new programmer is iteration. Start with an empty list, then append members to that list that past your test.
>>> tmp = []
>>> for item in lst:
... _discard, test = item.split()
... test = float(test)
... if test > 0.6:
... tmp.append(item)
>>> tmp
['101 0.7', '102 0.8', '103 0.7', '105 0.7', '106 0.8']
Another approach that you may encounter uses a few builtin function called filter
. Filter accepts a function to be called against every item in its second argument, an iterable. Item that return True
are moved into a new list. Items which fail are dropped.
To make this work, we use an anonymous function with the lambda
syntax for our test function. This makes it a little bit harder to interpret if you're not familiar with the syntax. More experienced programmers will typically prefer this method to the first though, due to it being quite concise and quite clear - the term filter
makes it clear what you're intending to do.
>>> filter(lambda item: float(item.split()[1]) > 0.6, lst)
['101 0.7', '102 0.8', '103 0.7', '105 0.7', '106 0.8']
The last – and probably most common approach these days – is to use what's known as a list comprehension. In this approach, you bundle everything inside a single line that doesn't need function calls. It's very fast, but can be a bit confusing for newcomers..
>>> [item for item in lst if float(item.split()[1]) >= 0.6]
['101 0.7', '102 0.8', '103 0.7', '105 0.7', '106 0.8']
Upvotes: 0
Reputation: 598
Try this one:
lst = ['100 0.0', '101 0.7', '102 0.8', '103 0.7', '104 0.0', '105 0.7', '106 0.8', '107 0.0']
start = False
results = []
prevEndPos = -1
for e in lst :
elems = e.split()
pos = int(elems[0])
score = float(elems[1])
print pos, score
if score >= 0.6 :
if start == False :
start = True
startPos = pos
prevEndPos = pos
else :
if start :
start = False
endPos = prevEndPos
results.append((startPos, endPos))
print results
Pay attention and do not name variables 'list', list is a type in Python and even if the code will work, it will hide the Python name.
You may save the results in a list of tuples (as above) or in a dict of tuples, or list of lists, or dict of lists, either way it works.
Output:
[(101, 103), (105, 106)]
The output means: first region start at 101 ends at 103 second region start at 105 ends at 106.
Upvotes: 0
Reputation: 10951
You can also use the built-in method filter , this way:
>>> filter(lambda s: float(s.split()[-1])>=0.6, list)
['101 0.7', '102 0.8', '103 0.7', '105 0.7', '106 0.8']
Upvotes: 0
Reputation: 19825
I want to find regions where the score is .6 or greater
In [14]: [ int(l.split()[0]) for l in list if float(l.split()[1])>0.6 ]
Out[14]: [101, 102, 103, 105, 106]
Upvotes: 1
Reputation: 174816
Print all the elements only if the second number is greater than or equal to 0.6
.
>>> lst = ['100 0.0', '101 0.7', '102 0.8', '103 0.7', '104 0.0', '105 0.7', '106 0.8', '107 0.0']
>>> [i for i in lst if float(i.split()[1]) >= 0.6]
['101 0.7', '102 0.8', '103 0.7', '105 0.7', '106 0.8']
Upvotes: 1