ikwangz
ikwangz

Reputation: 13

How to extract certain elements from a string?

I have a lot of files and I have saved all filenames to filelists.txt. Here is an example file:

cpu_H1_M1_S1.out  
cpu_H1_M1_S2.out  
cpu_H2_M1_S1.out  
cpu_H2_M1_S2.out  

When the program detects _H, _M, _S in the file name. I need to output the numbers that appear afterwards. For example:

_H     _M     _S  
1       1      1  
1       1      2  
2       1      1  
2       1      2  

Thank you.

Upvotes: 0

Views: 147

Answers (3)

inspectorG4dget
inspectorG4dget

Reputation: 113955

Though I have nothing against regex itself, I think it's overkill for this problem. Here's a lighter solution:

five = operator.itemgetter(5)
seven = operator.itemgetter(7)
nine = operator.itemgetter(9)
with open("filelists.txt") as f:
    for line in f:
        return [(int(five(line)), int(seven(line)), int(nine(nine))) for line in f]

Hope that helps

Upvotes: 0

Jon Clements
Jon Clements

Reputation: 142146

You could use a regexp:

>>> s = 'cpu_H2_M1_S2.out'
>>> re.findall(r'cpu_H(\d+)_M(\d+)_S(\d+)', s)
[('2', '1', '2')]

If it doesn't match the format exactly, you'll get an empty list as a result, which can be used to ignore the results. You could adapt this to convert the str's to int's if you wished:

[int(i) for i in re.findall(...)]

Upvotes: 2

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250951

something like this using regex:

In [13]: with open("filelists.txt") as f:
    for line in f:
        data=re.findall(r"_H\d+_M\d+_S\d+",line)
        if data:
            print [x.strip("HMS") for x in data[0].split("_")[1:]]
   ....:             
['1', '1', '1']
['1', '1', '2']
['2', '1', '1']
['2', '1', '2']

Upvotes: 0

Related Questions