peterretief
peterretief

Reputation: 2067

Format a python list and search for patterns

I am getting rows from a spreadsheet with mixtures of numbers, text and dates I want to find elements within the list, some numbers and some text for example

sg = [500782, u'BMOU9015488', u'SD4', u'CLOSED', -1, '', '', -1]
sg = map(str, sg) 
#sg = map(unicode, sg) #option?
if any("-1" in s for s in sg):
    #do something if matched  

I don't feel this is the correct way to do this, I am also trying to match stuff like -1.5 and -1.5C and other unexpected characters like OPEN15 compared to 15

I have also looked at

sg.index("-1")

If positive then its a match (Only good for direct matches)

Some help would be appreciated

Upvotes: 0

Views: 59

Answers (2)

jean-loup
jean-loup

Reputation: 609

If you want to call a function for each case, I would do it this way:

def stub1(elem):
    #do something for match of type '-1'
    return
def stub2(elem):
    #do something for match of type 'SD4'
    return        
def stub3(elem):
    #do something for match of type 'OPEN15'
    return

sg = [500782, u'BMOU9015488', u'SD4', u'CLOSED', -1, '', '', -1]
sg = map(unicode, sg) 
patterns = {u"-1":stub1, u"SD4": stub2, u"OPEN15": stub3} # add more if you want

for elem in sg:
    for k, stub in patterns.iteritems():
        if k in elem:
            stub(elem) 
            break

Where stub1, stub2, ... are the fonctions that contains the code for each case. It will be called (max 1 time per strings) if the string contains a matching substring.

Upvotes: 1

dcexcal
dcexcal

Reputation: 195

What do you mean by "I don't feel this is the correct way to do this" ? Are you not getting the result you expect ? Is it too slow ?

Maybe, you can organize your data by columns instead of rows and have a more specific filters. If you are looking for speed, I'd suggest using the numpy module which has a very intersting function called select()

Scipy select example

By transforming all your rows in a numpy array, you can test several columns in one pass. This function is amazingly efficient and powerful ! Basically it's used like this:

import numpy as np

a = array(...)
conds = [a < 10, a % 3 == 0, a > 25]
actions = [a + 100, a / 3, a * 10]
result = np.select(conds, actions, default = 0)

All values in a will be transformed as follow:

  • A value 100 will be added to any value of a which is smaller than 10
  • Any value in a which is a multiple of 3, will be divided by 3
  • Any value above 25 will be multiplied by 10
  • Any other value, not matching the previous conditions, will be set to 0

Bot conds and actions are lists, and must have the same number of arguments. The first element in conds has its action set as the first element of actions.

It could be used to determine the index in a vector for a particular value (eventhough this should be done using the nonzero() numpy function).

a = array(....)
conds = [a <= target, a > target]
actions = [1, 0]
index = select(conds, actions).sum()

This is probably a stupid way of getting an index, but it demonstrates how we can use select()... and it works :-)

Upvotes: 1

Related Questions