Omar
Omar

Reputation: 27

how can I capture part of a string using regular expressions?

(in java) I want to create a function to extract parts of a string using regular expressions:

public HashMap<Integer,String> extract(String sentence, String expression){
} 

//I need to send a sentence like this for example:

HashMap<Integer,String> parts =extract("hello Jhon how are you", "(hello|hi) @1 how are @2");

// the expression validates: the sentence must start with hello or hi, next a word or group of words, next the words: "how are" and next other words extra // And I want to get this:

parts.get(1) --> "Jhon"
parts.get(2) --> "you"

//but this function return null if I give this:

extract("any other words","hello @1 how are @2");

I was doing it without regular expressions but the code became a little large and I'm not sure if it would be better use regular expressions to get a faster process and how could i do it with regular expressions.

Upvotes: 0

Views: 140

Answers (1)

Eugene
Eugene

Reputation: 11075

Thanks for @ajb 's comment. I've modified my question to meet Omar's requirement. It's more complicated than what I think, lol.

I assume Omar wants to use regular expression he provided to capture specific word. He uses @1, @2 ... @n to represent what he wants to capture and the integer value is also the key to retrieve the target from a map.

Edit, the OP wants to put the @n into parenthese, I will preprocess the expression to replace "(" with "(?:". If this is the case, the group will still take effect but not for capture.

import java.util.ArrayList;
import java.util.HashMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String args[]){

        Test test = new Test();
        String sentence1 = "whats the number of apple";
        String expression1 = "whats the (number of @1|@1s number)";
        HashMap<Integer, String> map1 = test.extract(sentence1, expression1);
        System.out.println(map1);
        String sentence2 = "whats the bananas number";
        HashMap<Integer, String> map2 = test.extract(sentence2, expression1);
        System.out.println(map2);
        String sentence3 = "hello Jhon how are you";
        String expression3 = "(hello|hi) @1 how are @2";
        HashMap<Integer, String> map3 = test.extract(sentence3, expression3);
        System.out.println(map3);
    }

    public HashMap<Integer,String> extract(String sentence, String expression){
        expression = expression.replaceAll("\\(", "\\(?:");
        ArrayList<Integer> keys = new ArrayList<Integer>();
        String regex4Expression = "@([\\d]*)";
        Pattern pattern4Expression = Pattern.compile(regex4Expression);
        Matcher matcher4Expression = pattern4Expression.matcher(expression);
        while(matcher4Expression.find()){
            for(int i = 1; i <= matcher4Expression.groupCount(); i++){
                if(!keys.contains(Integer.valueOf(matcher4Expression.group(i)))){
                    keys.add(Integer.valueOf(matcher4Expression.group(i)));
                }
            }
        }
        String regex = expression.replaceAll("@[\\d]*", "([\\\\w]*)");
        HashMap<Integer, String> map = new HashMap<Integer, String>();
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(sentence);

        while(matcher.find()){
            ArrayList<String> targets = new ArrayList<String>();
            for(int i = 1; i <= matcher.groupCount(); i++){
                if(matcher.group(i) != null){
                    targets.add(matcher.group(i));
                }
            }
            for(int j = 0; j < keys.size(); j++){
                map.put(j + 1, targets.get(j));
            }
        }
        return map;
    } 
}

The result is as below

{1=apple}
{1=banana}
{1=Jhon, 2=you}

Upvotes: 1

Related Questions