IH9522
IH9522

Reputation: 11

Tokenize method: Split string into array

I've been really struggling with a programming assignment. Basically, we have to write a program that translates a sentence in English into one in Pig Latin. The first method we need is one to tokenize the string, and we are not allowed to use the Split method usually used in Java. I've been trying to do this for the past 2 days with no luck, here is what I have so far:

  public class PigLatin 
    { 
        public static void main(String[] args) 
        { 
              String s = "Hello there my name is John"; 
              Tokenize(s); 
        } 

        public static String[] Tokenize(String english) 
        { 
             String[] tokenized = new String[english.length()]; 
             for (int i = 0; i < english.length(); i++) 
             { 
                   int j= 0; 
                   while (english.charAt(i) != ' ') 
                   { 
                         String m = ""; 
                         m = m + english.charAt(i); 
                         if (english.charAt(i) == ' ') 
                         { 
                              j++; 
                         } 
                         else 
                         { 
                               break; 
                         } 
                    } 
          for (int l = 0; l < tokenized.length; l++) { 
          System.out.print(tokenized[l] + ", "); 
        }
      }
    return tokenized;
    }
}

All this does is print an enormously long array of "null"s. If anyone can offer any input at all, I would reallllyyyy appreciate it!

Thank you in advance Update: We are supposed to assume that there will be no punctuation or extra spaces, so basically whenever there is a space, it's a new word

Upvotes: 1

Views: 1791

Answers (4)

epipav
epipav

Reputation: 339

             String english = "hello my fellow friend"
             ArrayList tokenized = new ArrayList<String>(); 
             String m = "";
             int j = 0; //index for tokenised array list.
             for (int i = 0; i < english.length(); i++) 
             { 

                   //the condition's position do matter here, if you  
                   //change them, english.charAt(i) will give index      
                   //out of bounds exception
                   while( i < english.length() && english.charAt(i) != ' ') 
                   { 
                         m = m + english.charAt(i); 
                         i++;

                   }
                   //add to array list if there is some string
                   //if its only  ' ', array will be empty so we are OK.
                   if(m.length() > 0 )
                   {
                       tokenized.add(m);
                       j++;
                       m = "";

                   }

             }    
          //print the array list
          for (int l = 0; l < tokenized.size(); l++) { 
          System.out.print(tokenized.get(l) + ", "); 

                        }

This prints, "hello,my,fellow,friend," I used an array list since at the first sight the length of the array is not clear.

Upvotes: 0

Adrian Shum
Adrian Shum

Reputation: 40036

Some hints for you to do the "manual splitting" work.

  1. There is a method String#indexOf(int ch, int fromIndex) to help you to find next occurrence of a character
  2. There is a method String#substring(int beginIndex, int endIndex) to extract certain part of a string.

Here is some pseudo-code that show you how to split it (there are more safety handling that you need, I will leave that to you)

List<String> results = ...;
int startIndex = 0;
int endIndex = 0;

while (startIndex < inputString.length) {
    endIndex = get next index of space after startIndex
    if no space found {
        endIndex = inputString.length
    }
    String result = get substring of inputString from startIndex to endIndex-1
    results.add(result)
    startIndex = endIndex + 1  // move startIndex to next position after space
}

// here, results contains all splitted words

Upvotes: 0

Elliott Frisch
Elliott Frisch

Reputation: 201437

If I understand your question, and what your Tokenize was intended to do; then I would start by writing a function to split the String

static String[] splitOnWhiteSpace(String str) {
    List<String> al = new ArrayList<>();
    StringBuilder sb = new StringBuilder();
    for (char ch : str.toCharArray()) {
        if (Character.isWhitespace(ch)) {
            if (sb.length() > 0) {
                al.add(sb.toString());
                sb.setLength(0);
            }
        } else {
            sb.append(ch);
        }
    }
    if (sb.length() > 0) {
        al.add(sb.toString());
    }
    String[] ret = new String[al.size()];
    return al.toArray(ret);
}

and then print using Arrays.toString(Object[]) like

public static void main(String[] args) {
    String s = "Hello there my name is John";
    String[] words = splitOnWhiteSpace(s);
    System.out.println(Arrays.toString(words));
}

Upvotes: 1

Dando18
Dando18

Reputation: 622

If you're allowed to use the StringTokenizer Object (which I think is what the assignment is asking, it would look something like this:

StringTokenizer st = new StringTokenizer("this is a test");
 while (st.hasMoreTokens()) {
     System.out.println(st.nextToken());
 }

which will produce the output:

 this
 is
 a
 test

Taken from here.

The string is split into tokens and stored in a stack. The while loop loops through the tokens, which is where you can apply the pig latin logic.

Upvotes: 0

Related Questions