Patrick Bucher
Patrick Bucher

Reputation: 1538

R: Combining data.frames in a more elegant way

I'm building up a data frame based on random entries/rows. Here's the function that creates a random entry:

createRandomEntry <- function() {
    names <- c('Dilbert', 'Wally', 'Alice', 'Ashok', 'Topper')
    ages <- 30:45
    return(
        data.frame(
            Name = sample(names, 1),
            Age = sample(ages, 1),
            stringsAsFactors = FALSE
        )
    )
}

Now I'm combining them to one big data.frame using this function:

createRandomEntries <- function(n) {
    df <- createRandomEntry()
    for (i in 2:n) {
        df <- rbind(df, createRandomEntry())
    }
    return(df)
}

Technically, it works well, but it's a bit clumsy for many reasons:

In an earlier version, createRandomEntry() returned a list rather than a data.frame. Then I used replicate() to create a matrix, which first had to be transposed (by calling t() on it) in order to create a data.frame out of it. And that data.frame wasn't sortable (error: "unimplemented type 'list' in 'orderVector1'"). Calling unlist() on every row or returning a vector from createRandomEntry() would fix the sorting issues, but then I'd just get strings in every column.

There must be a better way. But how?

Edit: It's important to have a function that creates one single entry, because some of the values of an entry could be related to each other, like this enhanced function shows:

createRandomEntry <- function() {
    names <- c('Dilbert', 'Wally', 'Alice', 'Ashok', 'Topper')
    ages <- 30:45
    startedIn <- sample(1995:2005, 1)
    lostMotivation <- startedIn + sample(1:3, 1)
    return(
        data.frame(
            Name = sample(names, 1),
            Age = sample(ages, 1),
            StartYear = startedIn,
            LostMotivation = lostMotivation,
            stringsAsFactors = FALSE
        )
    )
}
createRandomEntries(3)

Which produces:

     Name Age StartYear LostMotivation
1   Ashok  42      1998           2000
2 Dilbert  43      1997           1999
3 Dilbert  30      1996           1999

Upvotes: 0

Views: 52

Answers (2)

Patrick Bucher
Patrick Bucher

Reputation: 1538

Based on Bruno Zamengo's answer, I've now rewritten the function:

createRandomEntries <- function(n) {
    names <- c('Dilbert', 'Wally', 'Alice', 'Ashok', 'Topper')
    ages <- 30:45
    df <- data.frame(
        Name = sample(names, n, replace = TRUE),
        Age = sample(ages, n, replace = TRUE),
        StartYear = sample(1995:2005, n, replace = TRUE),
        stringsAsFactors = FALSE
    )
    df$LostMotivation <- df$StartYear + sample(1:3, n, replace = TRUE)
    return(df)
}

However, I didn't use merge, as suggested.

Upvotes: 0

Bruno Zamengo
Bruno Zamengo

Reputation: 860

Just move n from the second function to the first one?

createRandomEntries <- function(n) {
    names <- c('Dilbert', 'Wally', 'Alice', 'Ashok', 'Topper')
    ages <- 30:45
    return(
        data.frame(
            Name = sample(names, n, TRUE),
            Age = sample(ages, n, TRUE),
            stringsAsFactors = FALSE
        )
    )
}

Upvotes: 3

Related Questions