Reputation: 33
I have to store large data table with about 10 million rows and several columns. What i need to do can be summarized as follows:
1.Based on the values in columns I need to select some of the rows.
Example:
row 500 : |10|3|4|5|100|314|45|
row 501 : |13|5|7|4|160|210|40|
row 501 : |24|3|8|6|260|810|50|
row 602 : |34|7|9|6|350|760|10|
Here, the first column value can be considered as row ID. Here the IDs are 10,13,24,34
suppose I am searching for those rows which have value>=5 in their 4th column. So, after filtering the output will be:
row 500 : |10|3|4|5|100|314|45|
row 501 : |24|3|8|6|260|810|50|
row 602 : |34|7|9|6|350|760|10|
2.In the second step i need to compare between them in a column-wise fashion. Suppose, row 500 has values 3 and 4 respectively in its 2nd and 3rd column , which(3-4) also falls in the range (3-8). But this range does not coincide with (7-9)
So there is a relation between row 500 and row 501. And the output will be:
10 24
24 34
3.Suppose i am given the value 10. Then i need to find row having 10 in its first column and reduce the value in the 7th column by 5. So the row will now look like:
row 500 : |10|3|4|5|100|314|40|
I was so far using Matlab for these operations using matlab library functions very easily. However I need to convert the whole code in Java. One way to do this is to use large arrays and use for loops to access every row. will it be efficient for such a big array? Please help me in this regard.
Upvotes: 1
Views: 1867
Reputation: 64
well i will try to define your needs and based on them i will give the appropriate data structure. 1- you need fast access to the elements. based on this avoid using LinkedList and use either ArrayList or static array. 2- since your data is large i recommend that you dont load them all on main memory (dynamic loading)
Note : there is more advanced ways to optimize your access by using B+ tree but i don't want to go deepper try what i said above and i don't think you will need to optimize it more (if you implement the dynamic loading correctly and efficintly).
Upvotes: 1
Reputation: 76201
Firstly, I would suggest using an in-memory RDBS like SQLLite, HyperSQL, JavaDB
After that then you could take a look at Table in google's guava library.
Row based lookups are fastest with HashBasedTable and TreeBasedTable, but you may want to consider ArrayTable since it looks like your data is not sparse.
Finally, take a look at this question.
Upvotes: 3