Reputation: 70
I am creating a string that is about 30 million words long. As you can imagine, this takes absolutely forever to create with a for-loop increasing by about 100 words at a time. Is there a way to represent the string in a more memory-friendly way, like a numpy array? I have very little experience with numpy.
bigStr = ''
for tweet in df['text']:
bigStr = bigStr + ' ' + tweet
len(bigStr)
Upvotes: 0
Views: 790
Reputation: 531135
If you want to build a string, use ' '.join
, which will create the final string in O(n) time, rather than building it up one piece at a time, which takes O(n^2) time.
bigStr = ' '.join([tweet for tweet in df['text']])
Upvotes: 1
Reputation: 2399
I can see you're trying to get the length of all data. For that you don't need to append all strings. (And I see you add a white space for each element)
Just get the length of tweet
and add it to an integer variable (+1 for each white space):
number_of_texts = 0
for tweet in df['text']:
number_of_texts += 1 + len(tweet)
print(number_of_texts)
Upvotes: 0