Reputation: 355
I am considering preforming some text-mining on a set of large individual .pst file containing >4 years of communication.
Initially, I would like to just extract the header information to identify social networks, but ultimately would like to begin to classify emails based on key-words or create some structured output that would support some further analysis.
Does anyone have any suggestions where to begin?
Upvotes: 1
Views: 3201
Reputation: 78
You should check the research done on the publicly available Enron Email Dataset -> The page has link to some interesting papers
Upvotes: 2