Reputation: 43
I am new to python and looking help to extract tags from string by string array. Let's say I have string array of ['python', 'c#', 'java', 'f#' ]
And input string of "I love Java and python"
.
The output should be array ['java', 'python']
Thanks for any help.
Upvotes: 2
Views: 129
Reputation: 18906
import re
stringarray = ['python', 'c#', 'core java', 'f#' ]
string = "I love Core Java and python"
pattern = '|'.join(stringarray)
output = re.findall(pattern, string.lower())
# ['core java', 'python']
stringarray = ['python', 'c#', 'core java', 'f#' ]
string = "I love Core Java and python"
output = [i for i in stringarray if i in string.lower()]
# ['core java', 'python']
stringarray = ['python', 'c#', 'java', 'f#' ]
string = "I love Java and python"
output = list(set(string.lower().split()).intersection(stringarray))
# ['java', 'python']
Short explanation: By doing string.lower().split()
we split the words as lower-case in your inputstring by the default (blankspace). By converting it to a set we can access the set function intersection. Intersection will in turn find the occurences that are in both sets. Finally we wrap this around a list to get desired output. As commented by Joe Iddon this will not return repeated tags.
Are you interested in counts? Consider using collections counter and a dict comprehension:
from collections import Counter
count = {k:v for k,v in Counter(string.lower().split()).items() if k in stringarray}
print(count)
#{'java': 1, 'python': 1}
Upvotes: 4
Reputation: 51335
You could use the following list comprehension, which turns your string into lowercase, then iterates through each word (after using split
), and returns which ones are in your array:
arr = ['python', 'c#', 'java', 'f#' ]
s = "I love Java and python"
outp = [i for i in s.lower().split() if i in arr]
>>> outp
['java', 'python']
Or you could use regular expressions:
import re
arr = ['python', 'c#', 'java', 'f#' ]
s = "I love Java and python"
outp = re.findall('|'.join(arr),s.lower())
>>> outp
['java', 'python']
Upvotes: 3
Reputation: 20414
Turn your tags list into a set, so lookup is average case O(1)
lookup, and then use a list-comprehension to perform an O(1)
tag search.
def extract(string, tags):
tags = set(tags)
return [w for w in string.lower().split() if w in tags]
and a test:
>>> extract('I love Java and python', ['python', 'c#', 'java', 'f#' ])
['java', 'python']
Upvotes: 2