Reputation: 135
I Have looked through Stack, but none of the examples work in my case (from what I have tried).
I want to count how many times a word occurs in an array. This is done by splitting up an input String, such as "Henry and Harry went out" and counting the distinct characters of varying length (in the following example it is 2) Please forgive me if my style is bad, its my first project...
He = 1
en = 2
nr = 1
ry = 2
a = 1
an = 1
etc....... Here is my code for the constructor:
public NgramAnalyser(int n, String inp)
{
boolean processed = false;
ngram = new HashMap<>(); // used to store the ngram strings and count
alphabetSize = 0;
ngramSize = n;
ArrayList<String> tempList = new ArrayList<String>();
System.out.println("inp length: " + inp.length());
System.out.println();
int finalIndex = 0;
for(int i=0; i<inp.length()-(ngramSize - 1); i++)
{
tempList.add(inp.substring(i,i+ngramSize));
alphabetSize++;
if(i == (inp.length()- ngramSize))
// if i (the index) has reached the boundary limit ( before it gets an error), then...
{
processed = true;
finalIndex = i;
break;
}
}
if(processed == true)
{
for(int i=1; i<(ngramSize); i++)
{
String startString = inp.substring(finalIndex+i,inp.length());
String endString = inp.substring(0, i);
tempList.add(startString + endString);
}
}
for(String item: tempList)
{
System.out.println(item);
}
}
// code for counting the ngrams and sorting them
Upvotes: 1
Views: 221
Reputation: 1
This code takes the string converts it to same alphabetical case, remove spaces and turns to array. insert each value one by one, if it already exist increment its count by one other wise put the count as one. Good luck
//take random string, convert to same case to (Lower or upper) then turn to
character array
char[] charArray = "This is an example text".replaceAll("\\s","").toLowerCase().toCharArray();
System.out.println(Arrays.toString(charArray));
Map<Character, Integer> charCount = new HashMap<>();
for (char c : charArray){
//if key doesnt exist put it and update count value to 1
if(!charCount.containsKey(c)){
charCount.put(c, 1);
}else{
//if key exist increment value by 1
charCount.put(c, charCount.get(c) + 1);
}
}
System.out.println(charCount.toString());
output:
[t, h, i, s, i, s, a, n, e, x, a, m, p, l, e, t, e, x, t]
{p=1, a=2, s=2, t=3, e=3, h=1, x=2, i=2, l=1, m=1, n=1}
Upvotes: 0
Reputation: 364
This method creates a HashMap with the keys being the different items and the values the item count. I think the code is pretty easy to understand but ask if there's something that isn't clear or might be wrong
public Map<String, Integer> ngram(String inp, Integer n)
{
Map<String, Integer> nGram = new HashMap<>();
for(int i = 0; i < inp.length() - n - 1; i++)
{
String item = inp.substring(i, i+n);
int itemCount = nGram.getOrDefault(item, 0);
nGram.put(item, itemCount+1);
}
return nGram;
}
Upvotes: 0
Reputation: 30197
A simple solution should use the Map<String, Integer> ngram
and, while iterating on your list of ngram, for each key (aka String
) found in your input update the counter (aka Integer
).
Upvotes: 2