Luckylukee
Luckylukee

Reputation: 595

Joining two h2o dataframes

I have two h2o frames and I want to join them based on one identical column exist in both, I am using Java API and get the h2o frames from spark dataframes.

    H2OFrame trainDataFrame = h2oContext.asH2OFrame(train_validation); 
    H2OFrame validationDataFrame = h2oContext.asH2OFrame(train_validation);
    H2OFrame testDataFrame = h2oContext.asH2OFrame(testSparkDataFrame); 

I can use spark dataframes to join data as my data is really big and RDD can work out here so I need to work with h2o frames as an in-memory object.

Upvotes: 3

Views: 1511

Answers (1)

Jsimp
Jsimp

Reputation: 58

Have a look at the h2o.merge() command.

# Currently, this function only supports `all.x = TRUE`. All other permutations will fail.
library(h2o)
h2o.init()

# Create two simple, two-column R data frames by inputting values, ensuring that both have a common column (in this case, "fruit").
left <- data.frame(fruit = c('apple','orange','banana','lemon','strawberry','blueberry'),
                   color = c('red','orange','yellow','yellow','red','blue'))
right <- data.frame(fruit = c('apple','orange','banana','lemon','strawberry','watermelon'),
                    citrus = c(FALSE, TRUE, FALSE, TRUE, FALSE, FALSE))

# Create the H2O data frames from the inputted data.
l.hex <- as.h2o(left)
print(l.hex)
        fruit  color
 1      apple    red
 2     orange orange
 3     banana yellow
 4      lemon yellow
 5 strawberry    red
 6  blueberry   blue

[6 rows x 2 columns]

r.hex <- as.h2o(right)
print(r.hex)
        fruit citrus
 1      apple  FALSE
 2     orange   TRUE
 3     banana  FALSE
 4      lemon   TRUE
 5 strawberry  FALSE
 6 watermelon  FALSE

[6 rows x 2 columns]

# Merge the data frames. The result is a single dataset with three columns.
left.hex <- h2o.merge(l.hex, r.hex, all.x = TRUE)
print(left.hex)
       fruit  color citrus
1  blueberry   blue   <NA>
2      apple    red  FALSE
3     banana yellow  FALSE
4      lemon yellow   TRUE
5     orange orange   TRUE
6 strawberry    red  FALSE

[6 rows x 3 columns]

Upvotes: 1

Related Questions