vy32
vy32

Reputation: 29667

Natural language de-identification

I am looking for a natural language tool that can automatically de-identify English text. For example, every email address should be renamed or obscured. But proper names should be de-identified, as should addresses and what not.

There is a MITRE Identification Scrubber Toolkit. I don't know how well it works.

My questions:

Thanks.

Upvotes: 3

Views: 586

Answers (1)

jogojapan
jogojapan

Reputation: 69997

De-identification (perhaps more often referred to as anonymization) is a very active research area as its success is obviously a requirement for the use of authentic text corpora in such fields as NLP for healthcare, medicine and the like. I recommend that you look at the tools listed in the answer to this question on CrossValidated. If you follow the links further, you will find research papers describing how these tools work with further references and results evaluations.

Upvotes: 2

Related Questions