Reputation: 7396
df <- data.frame(var1 = c('a', 'b', 'c'), var2 = c('d', 'e', 'f'),
freq = 1:3)
What is the simplest way to expand each row the first two columns of the data.frame above, so that each row is repeated the number of times specified in the column 'freq'?
In other words, go from this:
df
var1 var2 freq
1 a d 1
2 b e 2
3 c f 3
To this:
df.expanded
var1 var2
1 a d
2 b e
3 b e
4 c f
5 c f
6 c f
Upvotes: 204
Views: 122854
Reputation: 65
I am providing one more addition to this wonderful thread of nice answers! Use the tidyr
package (included in tidyverse
) for a one-liner solution:
df %>% tidyr::uncount(weights = freq)
Upvotes: 4
Reputation: 1970
in fact. use the methods of vector and index. we can also achieve the same result, and more easier to understand:
rawdata <- data.frame('time' = 1:3,
'x1' = 4:6,
'x2' = 7:9,
'x3' = 10:12)
rawdata[rep(1, time=2), ] %>% remove_rownames()
# time x1 x2 x3
# 1 1 4 7 10
# 2 1 4 7 10
Upvotes: 0
Reputation: 652
I know this is not the case but if you need to keep the original freq column, you can use another tidyverse
approach together with rep
:
library(purrr)
df <- data.frame(var1 = c('a', 'b', 'c'), var2 = c('d', 'e', 'f'), freq = 1:3)
df %>%
map_df(., rep, .$freq)
#> # A tibble: 6 x 3
#> var1 var2 freq
#> <fct> <fct> <int>
#> 1 a d 1
#> 2 b e 2
#> 3 b e 2
#> 4 c f 3
#> 5 c f 3
#> 6 c f 3
Created on 2019-12-21 by the reprex package (v0.3.0)
Upvotes: 8
Reputation: 388797
Another dplyr
alternative with slice
where we repeat each row number freq
times
library(dplyr)
df %>%
slice(rep(seq_len(n()), freq)) %>%
select(-freq)
# var1 var2
#1 a d
#2 b e
#3 b e
#4 c f
#5 c f
#6 c f
seq_len(n())
part can be replaced with any of the following.
df %>% slice(rep(1:nrow(df), freq)) %>% select(-freq)
#Or
df %>% slice(rep(row_number(), freq)) %>% select(-freq)
#Or
df %>% slice(rep(seq_len(nrow(.)), freq)) %>% select(-freq)
Upvotes: 10
Reputation: 28826
Another possibility is using tidyr::expand
:
library(dplyr)
library(tidyr)
df %>% group_by_at(vars(-freq)) %>% expand(temp = 1:freq) %>% select(-temp)
#> # A tibble: 6 x 2
#> # Groups: var1, var2 [3]
#> var1 var2
#> <fct> <fct>
#> 1 a d
#> 2 b e
#> 3 b e
#> 4 c f
#> 5 c f
#> 6 c f
One-liner version of vonjd's answer:
library(data.table)
setDT(df)[ ,list(freq=rep(1,freq)),by=c("var1","var2")][ ,freq := NULL][]
#> var1 var2
#> 1: a d
#> 2: b e
#> 3: b e
#> 4: c f
#> 5: c f
#> 6: c f
Created on 2019-05-21 by the reprex package (v0.2.1)
Upvotes: 5
Reputation: 15783
@neilfws's solution works great for data.frame
s, but not for data.table
s since they lack the row.names
property. This approach works for both:
df.expanded <- df[rep(seq(nrow(df)), df$freq), 1:2]
The code for data.table
is a tad cleaner:
# convert to data.table by reference
setDT(df)
df.expanded <- df[rep(seq(.N), freq), !"freq"]
Upvotes: 28
Reputation: 2665
old question, new verb in tidyverse:
library(tidyr) # version >= 0.8.0
df <- data.frame(var1=c('a', 'b', 'c'), var2=c('d', 'e', 'f'), freq=1:3)
df %>%
uncount(freq)
var1 var2
1 a d
2 b e
2.1 b e
3 c f
3.1 c f
3.2 c f
Upvotes: 121
Reputation: 23004
Use expandRows()
from the splitstackshape
package:
library(splitstackshape)
expandRows(df, "freq")
Simple syntax, very fast, works on data.frame
or data.table
.
Result:
var1 var2
1 a d
2 b e
2.1 b e
3 c f
3.1 c f
3.2 c f
Upvotes: 49
Reputation: 4380
In case you have to do this operation on very large data.frames I would recommend converting it into a data.table and use the following, which should run much faster:
library(data.table)
dt <- data.table(df)
dt.expanded <- dt[ ,list(freq=rep(1,freq)),by=c("var1","var2")]
dt.expanded[ ,freq := NULL]
dt.expanded
See how much faster this solution is:
df <- data.frame(var1=1:2e3, var2=1:2e3, freq=1:2e3)
system.time(df.exp <- df[rep(row.names(df), df$freq), 1:2])
## user system elapsed
## 4.57 0.00 4.56
dt <- data.table(df)
system.time(dt.expanded <- dt[ ,list(freq=rep(1,freq)),by=c("var1","var2")])
## user system elapsed
## 0.05 0.01 0.06
Upvotes: 8
Reputation: 33772
Here's one solution:
df.expanded <- df[rep(row.names(df), df$freq), 1:2]
Result:
var1 var2
1 a d
2 b e
2.1 b e
3 c f
3.1 c f
3.2 c f
Upvotes: 202