Reputation: 815
I have this list with part of speech tags and their specifics: ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
. As you can see there are no spaces between the characters, so it can be seen as one word.
Now I need a new list with only the part of speech tags, like this ['VNW', 'WW', 'LID']
.
I tried removing the brackets and everything in them with a regex like this pattern = re.compile(r'(.*)')
.
I also tried to match only the capital letters, but I can't get it right. Suggestions?
Upvotes: 0
Views: 223
Reputation: 1451
For example:
In [102]: s=['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
In [103]: [x.split('(', 1)[0] for x in s]
Out[103]: ['VNW', 'WW', 'LID']
Upvotes: 0
Reputation: 130
Some of the possible solutions are:
Removing Brackets using loop
l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
for i in range(len(l)):
i1,i2=l[i].find('('),l[i].find(')')
l[i]=l[i][:i1]+l[i][i2+1:]
print l
Using Regex
import re
pattern = r'\([^)]*\)'
l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
for i in range(len(l)):
l[i] = re.sub(pattern, '', l[i])
print l
Output: ['VNW', 'WW', 'LID']
Upvotes: 1
Reputation: 92854
Short solution using str.find()
function:
l = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)', 'LID(bep,stan,rest)']
result = [i[:i.find('(')] for i in l]
result
contents:
['VNW', 'WW', 'LID']
Upvotes: 0
Reputation: 369274
Regular expression is not need for this case. Split by (
; then get the first part only.
>>> 'VNW(pers,pron,nomin,red,2v,ev)'.split('(')
['VNW', 'pers,pron,nomin,red,2v,ev)']
>>> 'VNW(pers,pron,nomin,red,2v,ev)'.split('(')[0]
'VNW'
>>> xs = ['VNW(pers,pron,nomin,red,2v,ev)', 'WW(pv,tgw,met-t)',
'LID(bep,stan,rest)']
>>> [x.split('(')[0] for x in xs]
['VNW', 'WW', 'LID']
Upvotes: 3