Bambi
Bambi

Reputation: 815

Remove part of 'one word' string Python

I have this list with part of speech tags and their specifics: ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']. As you can see there are no spaces between the characters, so it can be seen as one word.

Now I need a new list with only the part of speech tags, like this ['VNW', 'WW', 'LID']. I tried removing the brackets and everything in them with a regex like this pattern = re.compile(r'(.*)').
I also tried to match only the capital letters, but I can't get it right. Suggestions?

Upvotes: 0

Views: 223

Answers (4)

SmartElectron
SmartElectron

Reputation: 1451

For example:

In [102]: s=['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
In [103]: [x.split('(', 1)[0] for x in s]
Out[103]: ['VNW', 'WW', 'LID']

Upvotes: 0

Ankit
Ankit

Reputation: 130

Some of the possible solutions are:

Removing Brackets using loop

l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
for i in range(len(l)):
    i1,i2=l[i].find('('),l[i].find(')')
    l[i]=l[i][:i1]+l[i][i2+1:]
print l

Using Regex

import re
pattern = r'\([^)]*\)'
l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
for i in range(len(l)):
    l[i] = re.sub(pattern, '', l[i])
print l        

Output: ['VNW', 'WW', 'LID']

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Short solution using str.find() function:

l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
result = [i[:i.find('(')] for i in l]

result contents:

['VNW', 'WW', 'LID']

Upvotes: 0

falsetru
falsetru

Reputation: 369274

Regular expression is not need for this case. Split by (; then get the first part only.

>>> 'VNW(pers,pron,nomin,red,2v,ev)'.split('(')
['VNW', 'pers,pron,nomin,red,2v,ev)']
>>> 'VNW(pers,pron,nomin,red,2v,ev)'.split('(')[0]
'VNW'

>>> xs = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)',
          'LID(bep,stan,rest)']
>>> [x.split('(')[0] for x in xs]
['VNW', 'WW', 'LID']

Upvotes: 3

Related Questions