pekoms
pekoms

Reputation: 63

extract and filter values from string list python

so I have an array that looks like the one below. the "error" substring always starts with this character a special character "‘" so I was able to just get the errors with something like this

a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', ' 248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
newlist = [x.split('‘')[1] for x in a]
print(newlist)

and the output would look like this

['ARDUINO_I2C_nI2C', 'RPY_I2C_BASE_ADDR_LIST', 'RPY_I2C_IRQ_LIST']  

but now, i also need to get the name of the file related to that error. The name of the file always start with a numeric substring that I also need to remove. the output I want would look like this

   ['ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'], ['rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'], ['rpy_i2c.c','RPY_I2C_IRQ_LIST']

I'll apreciate any suggestions. thanks.

Upvotes: 2

Views: 89

Answers (2)

Ajay Pun Magar
Ajay Pun Magar

Reputation: 468

I have created this code to get the exact result as you like but there could be more efficient ways too. I have split the values and used regex to get the needed result.

import re
a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', '248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
r=[]
for x in a:
    d=x.split(": error: ‘")
    r.append([re.sub("[0-9]{3}","",d[0].split(":")[0].strip()),d[1]])
print(r)

Upvotes: 0

pho
pho

Reputation: 25489

You could use a regular expression to capture the required parts of your string. For example, the following regex (Try it online):

\d+([^:]+):.*‘(.*)$

Explanation:
-----------
\d+                     : One or more numbers
   (     )    (  )      : Capturing groups
    [^:]+               : One or more non-colon characters (in capturing group 1)
          :             : One colon
           .*           : Any number of any character
             ‘          : The ‘ character
               .*       : Any number of any character (in capturing group 2)
                  $     : End of string

To use it:

import re

regex = re.compile(r"\d+([^:]+):.*‘(.*)$")

newlist = [regex.search(s).groups() for s in a]

which gives a list of tuples:

[('ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'),
 ('rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'),
 ('rpy_i2c.c', 'RPY_I2C_IRQ_LIST')]

If you really want a list of lists, you can convert the result of .groups() to a list:

newlist = [list(regex.search(s).groups()) for s in a]

Upvotes: 1

Related Questions