Reputation: 510
I want to create a data frame with rows that repeat.
Here is my original dataset:
> mtcars_columns_a
variables_interest data_set data_set_and_variables_interest mean
1 mpg mtcars mtcars$mpg 20.09062
2 disp mtcars mtcars$disp 230.72188
3 hp mtcars mtcars$hp 146.68750
Here is my desire dataset
> mtcars_columns_b
variables_interest data_set data_set_and_variables_interest mean
1 mpg mtcars mtcars$mpg 20.09062
2 mpg mtcars mtcars$mpg 20.09062
3 disp mtcars mtcars$disp 230.72188
4 disp mtcars mtcars$disp 230.72188
5 hp mtcars mtcars$hp 146.68750
6 hp mtcars mtcars$hp 146.68750
I know how to do this the long way manually, but this is time consuming and rigid. Is there a quicker way to do this that is more automated and flexible?
Here is the code I used to create the dataset:
# mtcars data
## displays data
mtcars
## 3 row data set
### lists columns of interest
# ---- NOTE: REQUIRES MANUAL INPUT
# ---- NOTE: lists variables of interest
mtcars_columns_a <-
data.frame(
c(
"mpg",
"disp",
"hp"
)
)
# ---- NOTE: REQUIRES MANUAL INPUT
# ---- NOTE: adds colnames
names(mtcars_columns_a)[names(mtcars_columns_a) == 'c..mpg....disp....hp..'] <- 'variables_interest'
### adds data set info
mtcars_columns_a$data_set <-
c("mtcars")
### creates data_set_and_variables_interest column
mtcars_columns_a$data_set_and_variables_interest <-
paste(mtcars_columns_a$data_set,mtcars_columns_a$variables_interest,sep = "$")
### creates mean column
mtcars_columns_a$mean <-
c(
mean(mtcars$mpg),
mean(mtcars$disp),
mean(mtcars$hp)
)
## 6 row data set., the long way
### lists columns of interest
# ---- NOTE: REQUIRES MANUAL INPUT
# ---- NOTE: lists variables of interest
mtcars_columns_b <-
data.frame(
c(
"mpg",
"mpg",
"disp",
"disp",
"hp",
"hp"
)
)
# ---- NOTE: REQUIRES MANUAL INPUT
# ---- NOTE: adds colnames
names(mtcars_columns_b)[names(mtcars_columns_b) == 'c..mpg....mpg....disp....disp....hp....hp..'] <- 'variables_interest'
### adds data set info
mtcars_columns_b$data_set <-
c("mtcars")
### creates data_set_and_variables_interest column
mtcars_columns_b$data_set_and_variables_interest <-
paste(mtcars_columns_b$data_set,mtcars_columns_b$variables_interest,sep = "$")
### creates mean column
mtcars_columns_b$mean <-
c(
mean(mtcars$mpg),
mean(mtcars$mpg),
mean(mtcars$disp),
mean(mtcars$disp),
mean(mtcars$hp),
mean(mtcars$hp)
)
Upvotes: 0
Views: 68
Reputation: 886938
Another option is uncount
library(dplyr)
library(tidyr)
mtcars_columns_a %>%
uncount(2)
Upvotes: 3
Reputation: 101024
You can try rep
like below
mtcars_columns_a[rep(seq(nrow(mtcars_columns_a)), each = 2),]
Upvotes: 2
Reputation: 1441
The order of records in a data.frame
object is usually not meaningful, so you could just do:
rbind(mtcars_columns_a, mtcars_columns_a)
If you need it to be in the order you showed, this is also simple:
mtcars_columns_b <- rbind(mtcars_columns_a, mtcars_columns_a)
mtcars_columns_b[order(mtcars_columns_b, mtcars_columns_b$name),]
Upvotes: 2
Reputation: 12699
Based on your expected output is this the sort of thing you were after?
The selection of required variables is made with the select
function and the mean calculated using the summarise
function following group_by
variables.
The duplication of data and adding of additional variables (not really sure if these are necessary) is carried out using mutate.
You can edit variable names using the dplyr::rename
function.
library(dplyr)
library(tidyr)
df <-
mtcars %>%
select(mpg, disp, hp) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value))
df1 <-
bind_rows(df, df) %>%
arrange(name) %>%
mutate(dataset = "mtcars",
variable = paste(dataset, name, sep = "$"))
df1
#> # A tibble: 6 x 4
#> name mean dataset variable
#> <chr> <dbl> <chr> <chr>
#> 1 disp 231. mtcars mtcars$disp
#> 2 disp 231. mtcars mtcars$disp
#> 3 hp 147. mtcars mtcars$hp
#> 4 hp 147. mtcars mtcars$hp
#> 5 mpg 20.1 mtcars mtcars$mpg
#> 6 mpg 20.1 mtcars mtcars$mpg
Created on 2021-04-06 by the reprex package (v1.0.0)
Upvotes: 2