mmann1123
mmann1123

Reputation: 5295

Grep on elements of a list

I have a list of files names:

names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']

While I have found some functions that can work to grep character strings, I haven't figured out how to grep all elements of a list.

for instance I would like to:

grep(names,'aet')

and get:

['aet2000','aet2001']

Sure its not too hard, but I am new to Python


update The question above apparently wasn't accurate enough. All the answers below work for the example but not for my actual data. Here is my code to make the list of file names:

years = range(2000,2011)
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
variables = ["cwd","ppt","aet","pet","tmn","tmx"]     #  *variable name*  with wildcards   
tifnames = list(range(0,(len(years)*len(months)*len(variables)+1)  ))
i = 0
for variable in variables:
   for year in years:
      for month in months:
         fullname = str(variable)+str(year)+str(month)+".tif"
         tifnames[i] = fullname
         i = i+1 

Running filter(lambda x:'aet' in x,tifnames) or the other answers return:

Traceback (most recent call last):
  File "<pyshell#89>", line 1, in <module>
    func(tifnames,'aet')
  File "<pyshell#88>", line 2, in func
    return [i for i in l if s in i]
TypeError: argument of type 'int' is not iterable

Despite the fact that tifnames is a list of character strings:

type(tifnames[1])
<type 'str'>

Do you guys see what's going on here? Thanks again!

Upvotes: 44

Views: 61785

Answers (5)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250941

Use filter():

>>> names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']
>>> filter(lambda x:'aet' in x, names)
['aet2000', 'aet2001']

with regex:

>>> import re
>>> filter(lambda x: re.search(r'aet', x), names)
['aet2000', 'aet2001']

In Python 3 filter returns an iterator, hence to get a list call list() on it.

>>> list(filter(lambda x:'aet' in x, names))
['aet2000', 'aet2001']

else use list-comprehension(it will work in both Python 2 and 3:

>>> [name for name in names if 'aet' in name]
['aet2000', 'aet2001']

Upvotes: 70

root
root

Reputation: 80346

>>> names = ['aet2000', 'ppt2000', 'aet2001', 'ppt2001']
>>> def grep(l, s):
...     return [i for i in l if s in i]
... 
>>> grep(names, 'aet')
['aet2000', 'aet2001']

Regex version, closer to grep, although not needed in this case:

>>> def func(l, s):
...     return [i for i in l if re.search(s, i)]
... 
>>> func(names, r'aet')
['aet2000', 'aet2001']

Upvotes: 9

user688635
user688635

Reputation:

You do not need to preallocate the list tifnames or use the counter to put in elements. Just append the data to the list as generated or use a list comprehension.

ie, Just do this:

import re

years = ['2000','2011']
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
variables = ["cwd","ppt","aet","pet","tmn","tmx"]     #  *variable name*  with wildcards   
tifnames = []
for variable in variables:
   for year in years:
      for month in months:
         fullname = variable+year+month+".tif"
         tifnames.append(fullname)

print tifnames
print '==='
print filter(lambda x: re.search(r'aet',x),tifnames)

Prints:

['cwd2000jan.tif', 'cwd2000feb.tif', 'cwd2000mar.tif', 'cwd2000apr.tif', 'cwd2000may.tif', 'cwd2000jun.tif', 'cwd2000jul.tif', 'cwd2000aug.tif', 'cwd2000sep.tif', 'cwd2000oct.tif', 'cwd2000nov.tif', 'cwd2000dec.tif', 'cwd2011jan.tif', 'cwd2011feb.tif', 'cwd2011mar.tif', 'cwd2011apr.tif', 'cwd2011may.tif', 'cwd2011jun.tif', 'cwd2011jul.tif', 'cwd2011aug.tif', 'cwd2011sep.tif', 'cwd2011oct.tif', 'cwd2011nov.tif', 'cwd2011dec.tif', 'ppt2000jan.tif', 'ppt2000feb.tif', 'ppt2000mar.tif', 'ppt2000apr.tif', 'ppt2000may.tif', 'ppt2000jun.tif', 'ppt2000jul.tif', 'ppt2000aug.tif', 'ppt2000sep.tif', 'ppt2000oct.tif', 'ppt2000nov.tif', 'ppt2000dec.tif', 'ppt2011jan.tif', 'ppt2011feb.tif', 'ppt2011mar.tif', 'ppt2011apr.tif', 'ppt2011may.tif', 'ppt2011jun.tif', 'ppt2011jul.tif', 'ppt2011aug.tif', 'ppt2011sep.tif', 'ppt2011oct.tif', 'ppt2011nov.tif', 'ppt2011dec.tif', 'aet2000jan.tif', 'aet2000feb.tif', 'aet2000mar.tif', 'aet2000apr.tif', 'aet2000may.tif', 'aet2000jun.tif', 'aet2000jul.tif', 'aet2000aug.tif', 'aet2000sep.tif', 'aet2000oct.tif', 'aet2000nov.tif', 'aet2000dec.tif', 'aet2011jan.tif', 'aet2011feb.tif', 'aet2011mar.tif', 'aet2011apr.tif', 'aet2011may.tif', 'aet2011jun.tif', 'aet2011jul.tif', 'aet2011aug.tif', 'aet2011sep.tif', 'aet2011oct.tif', 'aet2011nov.tif', 'aet2011dec.tif', 'pet2000jan.tif', 'pet2000feb.tif', 'pet2000mar.tif', 'pet2000apr.tif', 'pet2000may.tif', 'pet2000jun.tif', 'pet2000jul.tif', 'pet2000aug.tif', 'pet2000sep.tif', 'pet2000oct.tif', 'pet2000nov.tif', 'pet2000dec.tif', 'pet2011jan.tif', 'pet2011feb.tif', 'pet2011mar.tif', 'pet2011apr.tif', 'pet2011may.tif', 'pet2011jun.tif', 'pet2011jul.tif', 'pet2011aug.tif', 'pet2011sep.tif', 'pet2011oct.tif', 'pet2011nov.tif', 'pet2011dec.tif', 'tmn2000jan.tif', 'tmn2000feb.tif', 'tmn2000mar.tif', 'tmn2000apr.tif', 'tmn2000may.tif', 'tmn2000jun.tif', 'tmn2000jul.tif', 'tmn2000aug.tif', 'tmn2000sep.tif', 'tmn2000oct.tif', 'tmn2000nov.tif', 'tmn2000dec.tif', 'tmn2011jan.tif', 'tmn2011feb.tif', 'tmn2011mar.tif', 'tmn2011apr.tif', 'tmn2011may.tif', 'tmn2011jun.tif', 'tmn2011jul.tif', 'tmn2011aug.tif', 'tmn2011sep.tif', 'tmn2011oct.tif', 'tmn2011nov.tif', 'tmn2011dec.tif', 'tmx2000jan.tif', 'tmx2000feb.tif', 'tmx2000mar.tif', 'tmx2000apr.tif', 'tmx2000may.tif', 'tmx2000jun.tif', 'tmx2000jul.tif', 'tmx2000aug.tif', 'tmx2000sep.tif', 'tmx2000oct.tif', 'tmx2000nov.tif', 'tmx2000dec.tif', 'tmx2011jan.tif', 'tmx2011feb.tif', 'tmx2011mar.tif', 'tmx2011apr.tif', 'tmx2011may.tif', 'tmx2011jun.tif', 'tmx2011jul.tif', 'tmx2011aug.tif', 'tmx2011sep.tif', 'tmx2011oct.tif', 'tmx2011nov.tif', 'tmx2011dec.tif']
===
['aet2000jan.tif', 'aet2000feb.tif', 'aet2000mar.tif', 'aet2000apr.tif', 'aet2000may.tif', 'aet2000jun.tif', 'aet2000jul.tif', 'aet2000aug.tif', 'aet2000sep.tif', 'aet2000oct.tif', 'aet2000nov.tif', 'aet2000dec.tif', 'aet2011jan.tif', 'aet2011feb.tif', 'aet2011mar.tif', 'aet2011apr.tif', 'aet2011may.tif', 'aet2011jun.tif', 'aet2011jul.tif', 'aet2011aug.tif', 'aet2011sep.tif', 'aet2011oct.tif', 'aet2011nov.tif', 'aet2011dec.tif']

And, depending if you find this more readable, it would be more idiomatic Python to have this:

years = ['2000','2011']
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
vars = ["cwd","ppt","aet","pet","tmn","tmx"]        
tifnames = [v+y+m+".tif" for y in years for m in months for v in vars]
print tifnames
print '==='
print [e for e in tifnames if re.search(r'aet',e)]

...same output

Upvotes: 2

mrchampe
mrchampe

Reputation: 462

Try this out. It may not be the "shortest" of all the code shown, but for someone trying to learn python, I think it teaches more

names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']
found = []
for name in names:
    if 'aet' in name:
       found.append(name)
print found

Output

['aet2000', 'aet2001']

Edit: Changed to produce list.

See also:

How to use Python to find out the words begin with vowels in a list?

Upvotes: 12

Florin Stingaciu
Florin Stingaciu

Reputation: 8285

You should try to look into the pythong module called re. Bellow I have a grep function implmentation in python that uses re. It will help you understand how re works (of course only after you read about re)

def grep(pattern,word_list):
    expr = re.compile(pattern)
    return [elem for elem in word_list if expr.match(elem)]

Upvotes: 4

Related Questions