mindlessgreen
mindlessgreen

Reputation: 12112

R Expand frequencies to raw counts

If I have data like this

x <- data.frame(id=letters[1:4],v1=c(3,0,0,1),v2=c(1,1,0,1),v3=c(1,0,0,0),v4=c(2,0,2,0))

id  v1  v2  v3  v4
a   3   1   1   2
b   0   1   0   0
c   0   0   0   2
d   1   1   0   0

How is the best way to change it to

id  v1  v2  v3  v4
a   1   1   1   1
a   1   NA  NA  1
a   1   NA  NA  NA
b   0   1   0   0
c   0   0   0   1
c   NA  NA  NA  1
d   1   1   0   0

where the data is only composed of 1s and 0s. Each ID now gets multiple rows if needed. Empty positions are populated with NAs.

Upvotes: 3

Views: 123

Answers (2)

Darren Tsai
Darren Tsai

Reputation: 35584

You can use uncount() from tidyr.

library(dplyr)
library(tidyr)

x %>%
  uncount(apply(select(x, v1:v4), 1, max)) %>%
  group_by(id) %>% 
  mutate(across(v1:v4, ~ {if(first(.) == 0) 0 else rep(1, first(.))}[1:n()]))

# # A tibble: 7 x 5
# # Groups:   id [4]
#   id       v1    v2    v3    v4
#   <chr> <dbl> <dbl> <dbl> <dbl>
# 1 a         1     1     1     1
# 2 a         1    NA    NA     1
# 3 a         1    NA    NA    NA
# 4 b         0     1     0     0
# 5 c         0     0     0     1
# 6 c        NA    NA    NA     1
# 7 d         1     1     0     0

Upvotes: 2

lroha
lroha

Reputation: 34511

tidyr::uncount() is useful here but it's necessary to account for 0 values so they're not dropped:

library(tidyr)
library(dplyr)

x %>%
  pivot_longer(-id) %>%
  uncount(value + (value == 0)) %>%
  group_by(id, name) %>%
  mutate(rn = row_number(), 
         value = +(value != 0)) %>%
  pivot_wider(names_from = name, values_from = value) %>%
  select(-rn)

# A tibble: 7 x 5
# Groups:   id [4]
  id       v1    v2    v3    v4
  <chr> <int> <int> <int> <int>
1 a         1     1     1     1
2 a         1    NA    NA     1
3 a         1    NA    NA    NA
4 b         0     1     0     0
5 c         0     0     0     1
6 c        NA    NA    NA     1
7 d         1     1     0     0

Upvotes: 2

Related Questions