Hive combine column values based upon condition

Question

I was wondering if it is possible to combine column values based upon a condition. Let me explain...

Let say my data looks like this

Id name offset
1 Jan 100
2 Janssen 104
3 Klaas 150
4 Jan 160
5 Janssen 164

An my output should be this

Id fullname offsets
1 Jan Janssen [ 100, 160 ]

I would like to combine the name values from two rows where the offset of the two rows are no more apart then 1 character.

My question is if this type of data manipulation is possible with and if it is could someone share some code and explaination?

Please be gentle but this little piece of code return some what what I want...

    ArrayList persons = new ArrayList();

    // write your code here
    String _previous = "";

    //Sample output form entities.txt
    //USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660
    //USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685
    File file = new File("entities.txt");

    try {
        //
        // Create a new Scanner object which will read the data
        // from the file passed in. To check if there are more
        // line to read from it we check by calling the
        // scanner.hasNextLine() method. We then read line one
        // by one till all line is read.
        //
        Scanner scanner = new Scanner(file);
        while (scanner.hasNextLine()) {

            if(_previous == "" || _previous == null)
                _previous = scanner.nextLine();

            String _current = scanner.nextLine();
            //Compare the lines, if there offset is = 1
            int x = Integer.parseInt(_previous.split(",")[3]) + Integer.parseInt(_previous.split(",")[4]);
            int y = Integer.parseInt(_current.split(",")[4]);
            if(y-x == 1){
                persons.add(_previous.split(",")[1] + " " + _current.split(",")[1]);
                if(scanner.hasNextLine()){
                    _current = scanner.nextLine();
                }
            }else{
                persons.add(_previous.split(",")[1]);
            }
            _previous = _current;
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

    for(String person : persons){
        System.out.println(person);
    }

Working of this piece sample data

USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Richard,PERSON,7,2732
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2740
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2756
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3093
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3195
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,3220
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,10858
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,11063
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Ken,PERSON,3,11186
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,11234
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,17073
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,17095
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Stephanie,PERSON,9,17330
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Putt,PERSON,4,17340

Which produces this output

Richard Marottoli
Marottoli
Marottoli
Marottoli
Berkowitz
Berkowitz
Marottoli
Lea
Lea
Ken
Marottoli
Berkowitz
Lea
Stephanie Putt

Kind regards

Hive combine column values based upon condition

Answers (1)

Related Questions