Reputation: 405
For my bachelorthesis I need to shuffle sentences in a textcorpus.
Data looks like this:
[
['1', '$', '0-', '$', '10', 'Culture', ':', 'Play', 'Your', 'Way', 'to', 'China', '.'],
['2', '02.59', 'The', 'press', 'are', 'being', 'kept', 'well', 'away', 'as', 'the', 'couple', 'meet', '21', 'local', 'dignitaries', ',', 'reports', 'the', 'BBC', "'s", 'Peter', 'Hunt', ':', 'An', 'official', 'insisted', 'all', 'journalists', 'stand', 'inside', 'a', 'pen', 'at', 'a', 'deserted', 'airport', 'runway', '.'],
['3', '€0.25', ')', 'plus', 'PLN', '1', 'booking', 'fee', 'and', 'comfortable', 'vehicles', '.']
,
['4', '0', "'", '6', "''", 'x', '7', "'", '6', "''", '(', '0.17m', 'x', '2.31m', ')', 'Double', 'glazed', 'window', 'to', 'the', 'side', ',', 'heated', 'towel', 'rail', ',', 'ceramic', 'floor', 'tiles', ',', 'fully', 'tiled', 'walls', ',', 'spotlights', ',', 'low', 'level', 'W/C', ',', 'shower', 'unit', 'with', 'glass', 'surround', ',', 'sink', 'with', 'mixer', 'tap', ',', 'extractor', 'fan', '.'],
['5', '07:00', 'am', '-', 'Mon', ',', 'September', '19', '2011', 'I', 'already', 'have', 'the', 'Keystone', 'pipeline', 'running', 'through', 'my', 'properiety', 'this', 'is', 'Keystone', 'XL', 'or', 'extra', 'large', '.']
]
I have tried import shuffle from random
and also numpy.random.shuffle
, but all my minimal examples only work with lists of ints
, not with lists of strings.
Here you can see my latter try
import numpy as np
raw = open('eng_news_2016_300K-sentences.txt').read()
eng3Cor = [word_tokenize(sent) for sent in sent_tokenize(raw)]
eng3Cor = eng3Cor[:5]
del raw
y = np.array([np.array(xi, dtype=object) for xi in eng3Cor], dtype=object)`
Any advice how to do this?
EDIT: eng3Cor is the list of lists.
Upvotes: 0
Views: 37
Reputation: 782682
random.shuffle()
works with lists of any data type, not just lists of ints.
This will shuffle the words in each sentence:
import random
for sent in eng3Cor:
random.shuffle(sent)
This will just shuffle the order of the sentences:
import random
random.shuffle(eng3Cor)
Upvotes: 3