eWizardII
eWizardII

Reputation: 1936

50 million+ Rows of Data - CSV or MySQL

I have a CSV file which is about 1GB big and contains about 50million rows of data, I am wondering is it better to keep it as a CSV file or store it as some form of a database. I don't know a great deal about MySQL to argue for why I should use it or another database framework over just keeping it as a CSV file. I am basically doing a Breadth-First Search with this dataset, so once I get the initial "seed" set the 50million I use this as the first values in my queue.

Thanks,

Upvotes: 2

Views: 1613

Answers (5)

Hugh Bothwell
Hugh Bothwell

Reputation: 56634

From your previous questions, it looks like you are doing social-network searches against facebook friend data; so I presume your data is a set of 'A is-friend-of B' statements, and you are looking for a shortest connection between two individuals?

If you have enough memory, I would suggest parsing your csv file into a dictionary of lists. See Can this breadth-first search be made faster?

If you cannot hold all the data at once, a local-storage database like SQLite is probably your next-best alternative.

There are also some python modules which might help:

Upvotes: 1

Jochen Ritzel
Jochen Ritzel

Reputation: 107598

If you want to search on something graph-ish (since you mention Breadth-First Search) then a graph database might prove useful.

Upvotes: 2

dmytrivv
dmytrivv

Reputation: 608

How about some key-value storages like MongoDB

Upvotes: 0

Tom Neyland
Tom Neyland

Reputation: 6968

I would say that there are a wide variety of benefits to using a database over a CSV for such large structured data so I would suggest that you learn enough to do so. However, based on your description you might want to check out non-server/lighter weight databases. Such as SQLite, or something similar to JavaDB/Derby... or depending on the structure of your data a non-relational (Nosql) database- obviously you will need one with some type of python support though.

Upvotes: 3

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143084

Are you just going to slurp in everything all at once? If so, then CSV is probably the way to go. It's simple and works.

If you need to do lookups, then something that lets you index the data, like MySQL, would be better.

Upvotes: 1

Related Questions