Reputation: 3503
I'm collecting data (texts) with certain API (Live Streaming API) about specific event that is currently happening. The data that I'm receiving is based on a default list of keywords that I pass to the API. The API also collects keywords that occur in texts besides my default keywords, and then it adds them to my default list so that API can search for data with those keywords too. That's where the problem occurs, because some of those newly added keywords are not related to the event. I do not want to limit data search only on my default list, because I cannot cover all of the keywords that are used in texts.
My solution so far is to try and do Point-biserial correlation coefficient for each 1000 received data, but I am not sure if that is the correct approach and how to do it.
I would really appreciate if anyone could give me an advice or any kind of solution on how to approach to this problem?
Upvotes: 0
Views: 472