user2634655
user2634655

Reputation:

java regex pattern matcher to identify unusual characters and asian ideographs

I want to go through the following text to extract some certain elements based on the java regex patterns:

『卥』

For this element 『卥』, I guess I'll always be able to find the item between and and extract it, this should be feasable because those are pretty unusual entities so it should be a good basis to identify and extract whatever comes between them, i.e.

There's a lot of information on using java regex pattern matcher to match entire classes of characters but I've not found much on matching just one or two specific ones and removing things from between. That's certainly possible I would think, isn't it? How to do that?

Ideally something like

match(`『` and `』`)
{
     print(what comes between them)
}

Tried this, but didn't work:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class text_processing
{
    @SuppressWarnings("resource")
    public static void main(String[] args) throws IOException
    {
        String sCurrentLine; 
        BufferedReader br = new BufferedReader(new FileReader("/home/matthias/Workbench/SUTD/1_February/brute_force/items.csv"));


        Pattern p = Pattern.compile("/『(.*?)』/");


        while ((sCurrentLine = br.readLine()) != null) 
        {
            Matcher m = p.matcher(sCurrentLine);
            System.out.println(m);
        }
    }
}

Thank you for your consideration

Upvotes: 2

Views: 120

Answers (2)

m0bi5
m0bi5

Reputation: 9462

The below will be ur regex

"『(.*?)』"

Check out the working example here: https://regex101.com/r/lO8xR1/1

Upvotes: 2

laune
laune

Reputation: 31290

String text = ...; // your text
Pattern pat = Pattern.compile( "『([^』]*)』" );
Matcher mat = pat.matcher( text );
if( mat.find() ){
    System.out.println( mat.group(1) );
}

You can use this repeatedly to find all occurrences:

while( mat.find() ){
    System.out.println( mat.group(1) );
}

Upvotes: 1

Related Questions