Reputation: 1375
I'm trying to create a class in Python that ends up storing some text documents along with some metadata for each of the documents. Think of a structure like this:
ID Text Date Followers
1 "This is a tweet" 10/21/14 57
2 "This is another tweet" 10/22/14 100
3 "Yet another" 10/23/14 3899
4 "Another one" 10/25/14 234
What's the best and most memory efficient way to store stuff like this? Is it as four different lists (for example)? Or maybe a dictionary and/or tuples? Or as a Pandas Dataframe?
Are there significant differences between each one? I would like to store them as a Pandas dataframe just for ease of working with the data, but I also want to be mindful of memory usage and speed for larger datasets.
Upvotes: 3
Views: 2953
Reputation: 60756
Your question is really too broad to answer simply. However I can share a few thoughts.
I tend to only think of my data in 3 buckets as it relates to size:
We can spend forever talking about which framework or data structure we should use for each of these three buckets. However I've found that for my analytical work 90% of the time it's simple:
I only look for a data structure other than the above if I have a compelling reason.
I hope that helps a bit.
Upvotes: 5