Shery
Shery

Reputation: 1892

Sort by date and time

First of all many thanks all for your prompt response. You are all Awesome !!!

Let me eloborate a bit more.

I have CSV which is 2 GB in size. Has 4 Columns: Network_id, uptime, dateAndTimeStamp and directory number.

Network ids look like this: CBUK12345678 uptime: 550, 800, 600 (in seconds) dateAndTimeStamp: 0013-09-14 10:00:00 PM

it has 3 months of data based on dateAndTimeStamp. The interval between each dateAndTimeStamp is 15mins. i.e. if 0013-09-14 10:00:00 PM the next would be 0013-09-14 10:15:00 PM

There are missing timestamp values in the data. I want to sort the data for 24 hour period (doesnt matter which date, just have to be 24 hour (86400 seconds) ... once I have sorted the data for complete 24 hours (without any missing values in it) filter out those network ids whose up time is 95% of 86400 second in those 24 hours. i.e. add uptime for each 15 mins and see if its 95% of 86400 secs.

I want to select those network ids

Upvotes: 0

Views: 137

Answers (4)

Jeus
Jeus

Reputation: 486

Best way for sort by Date and time, insert that to database. when you insert data in database you can sort(order by).

Upvotes: 0

Raffaele
Raffaele

Reputation: 20885

Well, if this query is all you need, maybe you don't need to setup a database backend, and the following may be enough:

public void processFile(Reader in, Writer out) {
  Scanner in = new Scanner(file, charset);
  while (in.hasNextLine()) {
    processLine(in.nextLine(), out);
  }
}

public void processLine(String line, Writer out) {
  // Line is a class with three fields to represent network, uptime and timestamp
  Line line = parseCSV(line);
  // The filter interface is extracted for testing purposes
  if (filter.accept(line)) {
    out.write(/* whatever */);
  }
}

This sample uses a streaming approach, ie lines are evaluated and written to out as soon as they are read, since you state your input is 2GB. Note that I purposefully left out any filtering and CSV parsing logic.

For CSV you may use an external library, but for a quick and dirty solution the old String.split(",") may be enough. It doesn't work if the input contains escape characters and quotes, but it doesn't seem the case from your description.

The filtering logic may simply compare the uptime against a threshold value (or otherwise please better explain what dateAndTimeStamp refers to)

Upvotes: 1

adi
adi

Reputation: 1741

Read the file line by line (look here). Going by your description you only need to select data, not sort it. So while you go through each line, check if the line meets the requirement. If yes, save data\ref elsewhere.

Upvotes: 1

aphex
aphex

Reputation: 3412

I would import the csv file in a Database and query through the database.

There are some other tools you can use. I cannot garantee that they would work with such big file.

  • OpenOffice Database or in Excel
  • Use LogParser (Windows Tool)

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.

Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser

Upvotes: 0

Related Questions