At what point should NLP processing occur?

Question

In a perfect world, I'd have a bunch of data readily available to me without any time spent asking for and receiving it. But in the context of real applications, like google or facebook, you have a mountain of data stored in a database that requires time to query, and then you're trying process that data in order to draw meaningful conclusions / relationships.

In the context of counting and sorting lots of data in sql, you'd store data in summary tables to avoid the processing... and just update those tables with cron. But statistical analysis and nlp seems to be different.

The question is, at what point in the lifespan of data should the actually statistical/nlp/etc analysis occur?

nflacco · Accepted Answer

The way you usually do this is collect data, have it some sort of database (SQL or NoSQL) and then for processing dump it into a hadoop grid if it's a huge amount of data; otherwise do whatever you usually do. Then you have jobs analyzing that data and feeding the results back to you.

Get data -> Store it -> Dump it -> Analyze it -> Use results of offline analysis

Data crunching on an actual database just doesn't work too well.

At what point should NLP processing occur?

Answers (2)

Related Questions