Reputation: 399
I'm a newbie , I've written a tokenize function which basically takes in a txt file that consists of sentences and splits them based on whitespaces and punctuations. The thing here is it gives me an output with sublists present within a parent list.
My code:
def tokenize(document)
file = open("document.txt")
text = file.read()
hey = text.lower()
words = re.split(r'\s\s+', hey)
print [re.findall(r'\w+', b) for b in words]
My output:
[['what', 's', 'did', 'the', 'little', 'boy', 'tell', 'the', 'game', 'eggs', 'warden'], ['his', 'dad', 'was', 'warden', 'in', 'the', 'kitchen', 'poaching', 'eggs']]
Desired Output:
['what', 's', 'did', 'the', 'little', 'boy', 'tell', 'the', 'game', 'eggs', 'warden']['his', 'dad', 'was', 'warden', 'in', 'the', 'kitchen', 'poaching', 'eggs']
How do i remove the parent list out in my output ?? What changes do i need to make in my code inorder to remove the outer list brackets ??
Upvotes: 0
Views: 73
Reputation: 97
I have an example, which I guess is not much different from the problem you have...
where I only take a certain part of the list.
>>> a = [['sa', 'bbb', 'ccc'], ['dad', 'des', 'kkk']]
>>>
>>> print a[0], a[1]
['sa', 'bbb', 'ccc'] ['dad', 'des', 'kkk']
>>>
Upvotes: 0
Reputation: 174614
I want them as individual lists
A function in Python can only return one value. If you want to return two things (for example, in your case, there are two lists of words) you have to return an object that can hold two things like a list, a tuple, a dictionary.
Do not confuse how you want to print the output vs. what is the object returned.
To simply print the lists:
for b in words:
print(re.findall(r'\w+', b))
If you do this, then your method doesn't return anything (it actually returns None
).
To return both the lists:
return [re.findall(r'\w+', b) for b in words]
Then call your method like this:
word_lists = tokenize(document)
for word_list in word_lists:
print(word_list)
Upvotes: 2
Reputation: 447
this should work
print ','.join([re.findall(r'\w+', b) for b in words])
Upvotes: 0