Reputation: 5295
I have a list of files names:
names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']
While I have found some functions that can work to grep character strings, I haven't figured out how to grep all elements of a list.
for instance I would like to:
grep(names,'aet')
and get:
['aet2000','aet2001']
Sure its not too hard, but I am new to Python
update The question above apparently wasn't accurate enough. All the answers below work for the example but not for my actual data. Here is my code to make the list of file names:
years = range(2000,2011)
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
variables = ["cwd","ppt","aet","pet","tmn","tmx"] # *variable name* with wildcards
tifnames = list(range(0,(len(years)*len(months)*len(variables)+1) ))
i = 0
for variable in variables:
for year in years:
for month in months:
fullname = str(variable)+str(year)+str(month)+".tif"
tifnames[i] = fullname
i = i+1
Running filter(lambda x:'aet' in x,tifnames) or the other answers return:
Traceback (most recent call last):
File "<pyshell#89>", line 1, in <module>
func(tifnames,'aet')
File "<pyshell#88>", line 2, in func
return [i for i in l if s in i]
TypeError: argument of type 'int' is not iterable
Despite the fact that tifnames is a list of character strings:
type(tifnames[1])
<type 'str'>
Do you guys see what's going on here? Thanks again!
Upvotes: 44
Views: 61785
Reputation: 250941
Use filter()
:
>>> names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']
>>> filter(lambda x:'aet' in x, names)
['aet2000', 'aet2001']
with regex
:
>>> import re
>>> filter(lambda x: re.search(r'aet', x), names)
['aet2000', 'aet2001']
In Python 3 filter returns an iterator, hence to get a list call list()
on it.
>>> list(filter(lambda x:'aet' in x, names))
['aet2000', 'aet2001']
else use list-comprehension(it will work in both Python 2 and 3:
>>> [name for name in names if 'aet' in name]
['aet2000', 'aet2001']
Upvotes: 70
Reputation: 80346
>>> names = ['aet2000', 'ppt2000', 'aet2001', 'ppt2001']
>>> def grep(l, s):
... return [i for i in l if s in i]
...
>>> grep(names, 'aet')
['aet2000', 'aet2001']
Regex version, closer to grep, although not needed in this case:
>>> def func(l, s):
... return [i for i in l if re.search(s, i)]
...
>>> func(names, r'aet')
['aet2000', 'aet2001']
Upvotes: 9
Reputation:
You do not need to preallocate the list tifnames
or use the counter to put in elements. Just append the data to the list as generated or use a list comprehension.
ie, Just do this:
import re
years = ['2000','2011']
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
variables = ["cwd","ppt","aet","pet","tmn","tmx"] # *variable name* with wildcards
tifnames = []
for variable in variables:
for year in years:
for month in months:
fullname = variable+year+month+".tif"
tifnames.append(fullname)
print tifnames
print '==='
print filter(lambda x: re.search(r'aet',x),tifnames)
Prints:
['cwd2000jan.tif', 'cwd2000feb.tif', 'cwd2000mar.tif', 'cwd2000apr.tif', 'cwd2000may.tif', 'cwd2000jun.tif', 'cwd2000jul.tif', 'cwd2000aug.tif', 'cwd2000sep.tif', 'cwd2000oct.tif', 'cwd2000nov.tif', 'cwd2000dec.tif', 'cwd2011jan.tif', 'cwd2011feb.tif', 'cwd2011mar.tif', 'cwd2011apr.tif', 'cwd2011may.tif', 'cwd2011jun.tif', 'cwd2011jul.tif', 'cwd2011aug.tif', 'cwd2011sep.tif', 'cwd2011oct.tif', 'cwd2011nov.tif', 'cwd2011dec.tif', 'ppt2000jan.tif', 'ppt2000feb.tif', 'ppt2000mar.tif', 'ppt2000apr.tif', 'ppt2000may.tif', 'ppt2000jun.tif', 'ppt2000jul.tif', 'ppt2000aug.tif', 'ppt2000sep.tif', 'ppt2000oct.tif', 'ppt2000nov.tif', 'ppt2000dec.tif', 'ppt2011jan.tif', 'ppt2011feb.tif', 'ppt2011mar.tif', 'ppt2011apr.tif', 'ppt2011may.tif', 'ppt2011jun.tif', 'ppt2011jul.tif', 'ppt2011aug.tif', 'ppt2011sep.tif', 'ppt2011oct.tif', 'ppt2011nov.tif', 'ppt2011dec.tif', 'aet2000jan.tif', 'aet2000feb.tif', 'aet2000mar.tif', 'aet2000apr.tif', 'aet2000may.tif', 'aet2000jun.tif', 'aet2000jul.tif', 'aet2000aug.tif', 'aet2000sep.tif', 'aet2000oct.tif', 'aet2000nov.tif', 'aet2000dec.tif', 'aet2011jan.tif', 'aet2011feb.tif', 'aet2011mar.tif', 'aet2011apr.tif', 'aet2011may.tif', 'aet2011jun.tif', 'aet2011jul.tif', 'aet2011aug.tif', 'aet2011sep.tif', 'aet2011oct.tif', 'aet2011nov.tif', 'aet2011dec.tif', 'pet2000jan.tif', 'pet2000feb.tif', 'pet2000mar.tif', 'pet2000apr.tif', 'pet2000may.tif', 'pet2000jun.tif', 'pet2000jul.tif', 'pet2000aug.tif', 'pet2000sep.tif', 'pet2000oct.tif', 'pet2000nov.tif', 'pet2000dec.tif', 'pet2011jan.tif', 'pet2011feb.tif', 'pet2011mar.tif', 'pet2011apr.tif', 'pet2011may.tif', 'pet2011jun.tif', 'pet2011jul.tif', 'pet2011aug.tif', 'pet2011sep.tif', 'pet2011oct.tif', 'pet2011nov.tif', 'pet2011dec.tif', 'tmn2000jan.tif', 'tmn2000feb.tif', 'tmn2000mar.tif', 'tmn2000apr.tif', 'tmn2000may.tif', 'tmn2000jun.tif', 'tmn2000jul.tif', 'tmn2000aug.tif', 'tmn2000sep.tif', 'tmn2000oct.tif', 'tmn2000nov.tif', 'tmn2000dec.tif', 'tmn2011jan.tif', 'tmn2011feb.tif', 'tmn2011mar.tif', 'tmn2011apr.tif', 'tmn2011may.tif', 'tmn2011jun.tif', 'tmn2011jul.tif', 'tmn2011aug.tif', 'tmn2011sep.tif', 'tmn2011oct.tif', 'tmn2011nov.tif', 'tmn2011dec.tif', 'tmx2000jan.tif', 'tmx2000feb.tif', 'tmx2000mar.tif', 'tmx2000apr.tif', 'tmx2000may.tif', 'tmx2000jun.tif', 'tmx2000jul.tif', 'tmx2000aug.tif', 'tmx2000sep.tif', 'tmx2000oct.tif', 'tmx2000nov.tif', 'tmx2000dec.tif', 'tmx2011jan.tif', 'tmx2011feb.tif', 'tmx2011mar.tif', 'tmx2011apr.tif', 'tmx2011may.tif', 'tmx2011jun.tif', 'tmx2011jul.tif', 'tmx2011aug.tif', 'tmx2011sep.tif', 'tmx2011oct.tif', 'tmx2011nov.tif', 'tmx2011dec.tif']
===
['aet2000jan.tif', 'aet2000feb.tif', 'aet2000mar.tif', 'aet2000apr.tif', 'aet2000may.tif', 'aet2000jun.tif', 'aet2000jul.tif', 'aet2000aug.tif', 'aet2000sep.tif', 'aet2000oct.tif', 'aet2000nov.tif', 'aet2000dec.tif', 'aet2011jan.tif', 'aet2011feb.tif', 'aet2011mar.tif', 'aet2011apr.tif', 'aet2011may.tif', 'aet2011jun.tif', 'aet2011jul.tif', 'aet2011aug.tif', 'aet2011sep.tif', 'aet2011oct.tif', 'aet2011nov.tif', 'aet2011dec.tif']
And, depending if you find this more readable, it would be more idiomatic Python to have this:
years = ['2000','2011']
months = ["jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"]
vars = ["cwd","ppt","aet","pet","tmn","tmx"]
tifnames = [v+y+m+".tif" for y in years for m in months for v in vars]
print tifnames
print '==='
print [e for e in tifnames if re.search(r'aet',e)]
...same output
Upvotes: 2
Reputation: 462
Try this out. It may not be the "shortest" of all the code shown, but for someone trying to learn python, I think it teaches more
names = ['aet2000','ppt2000', 'aet2001', 'ppt2001']
found = []
for name in names:
if 'aet' in name:
found.append(name)
print found
Output
['aet2000', 'aet2001']
Edit: Changed to produce list.
See also:
How to use Python to find out the words begin with vowels in a list?
Upvotes: 12
Reputation: 8285
You should try to look into the pythong module called re. Bellow I have a grep function implmentation in python that uses re. It will help you understand how re works (of course only after you read about re)
def grep(pattern,word_list):
expr = re.compile(pattern)
return [elem for elem in word_list if expr.match(elem)]
Upvotes: 4