Sat Cit Ananda
Sat Cit Ananda

Reputation: 240

Which language or tools to learn for natural language processing?

I am French, and am a former Certified Network Security Administrator. I went back to university 3 years ago to achieve a Bachelor's degree in linguistics, and I am now going to enroll in a Masters Degree in Computer Science applied to Linguistics, with the objective of eventually trying to go through a Doctorate (but I'm not there yet :-) ).

The course will focus on speech recognition, automatic language translation, statistical analysis of texts, speech encoding and decoding, and information abstratction from textual sources. The professors will let us use any computer language we want to use to code the algorithms and programs we will develop during the curriculum.

I used to develop web apps as a side gig for about 3-4 years and I am proficient in Javascript as I wrote software that used node.js at the server end and the browser at the client. I also have some familiarity with postgresql.

My current style of coding (if we can call that a style) is mainly procedural and I use object prototyping as my main way to create/manage objects in my code. I don't have much experience with object oriented language that use the concept of classes to manage the objects. Therefore I am pretty confident my current coding skills are definitely lacking in regards to what is required for me to write efficient code to deal with that stuff.

So my question is this : what would be the best computer language for me to learn in order to be effective in writing algorithms and data structure suited for the above mentionned linguistic areas?

Thanks in advance for your enlightened answers.

Sat Cit Ananda.

Upvotes: 1

Views: 3754

Answers (2)

Andrew Tomazos
Andrew Tomazos

Reputation: 68668

For production NLP systems, Java seems to be the most common choice. It is a nice and safe language for beginner/intermediate programmers that scales well with codebase size, has a simple grammar and a vast standard library, and it is one of the most commonly used languages where software performance isn't the absolute top priority (or where performance can be scaled horizontally/distributed). I believe for example most of the higher layers of IBM Watson are written in Java. You'll also find it as one of the primary teaching languages in CS courses.

Upvotes: 0

Your question is opinion based, so probably off-topic here.

In France, you have a lot of good courses on Ocaml which is developed at INRIA with several good books (notably, both in French, Developpement d'Applications en Ocaml by Chailloux, Manoury, Pagano; and Programmation de Droite à Gauche & vice versa by Manoury). J.Pitrat also wrote Textes, Ordinateurs et Compréhension; his latest book artificial beings: the conscience of a conscious machines will also interest you.

And learning several programming languages, not only one, is always useful (a single programming language is not enough to do Natural Language Processing; you need to learn several programming languages and several programming paradigms - both functional and object paradigms are useful, and also prolog). You could also start reading the SICP while learning Scheme. Learning more about Lisp-like languages thru Queinnec's book Principe d'implementation de Scheme et Lisp - the updated version of Lisp In Small Pieces will also teach you a big lot.

Java might also be useful (because some NLP libraries are available in Java). CommonLisp, C++2011, Haskell ... too.

Also take time to use and master Linux (and its programming) and free software.

In general, natural language processing requires a lot of computer science (and math).

Upvotes: 3

Related Questions