daszlosek
daszlosek

Reputation: 1505

Generate Data Frame from Count Data

I am trying to create an unsummarized data frame from a data frame of count data.

I have had some experience creating sample datasets but I am having some trouble trying to get a specific number of rows and proportion for each state/person without coding each of them separately and then combining them. I was able to do it using the following code but I feel like there is a better way.

set.seed(2312)
dragon <- sample(c(1),3,replace=TRUE)
Maine  <- sample(c("Maine"),3,replace=TRUE)
Maine1 <- data.frame(dragon, Maine)

dragon <- sample(c(0),20,replace=TRUE)
Maine  <- sample(c("Maine"),20,replace=TRUE)
Maine2 <- data.frame(dragon, Maine)

Maine2

library(dplyr)

maine3 <- bind_rows(Maine1, Maine2)

Is there a better way to generate this dataset then the code above?

I am trying to create a data frame from the following count data:

+-------------+--------------+--------------+
|             | # of dragons | # no dragons |
+-------------+--------------+--------------+
| Maine       |            3 |            20|
| California  |            1 |            10|
| Jocko       |           28 |       110515 |
| Jessica Day |           17 |        26122 |
|             |           14 |        19655 |
+-------------+--------------+--------------+

And I would like it to look like this:

+-----------------------+---------------+
|                       | Dragons (1/0) |
+-----------------------+---------------+
| Maine                 | 1             |
| Maine                 | 1             |
| Maine                 | 1             |
| Maine                 | 0             |
| Maine….(2:20)         | 0….           |
| California            | 1             |
| California….(2:10)    | 0…            |
| Ect..                 |               |
+-----------------------+---------------+

I do not want the code written for me but would love with ideas on function or examples that you think might be helpful.

Upvotes: 3

Views: 329

Answers (2)

MKR
MKR

Reputation: 20095

One can use tidyr::expand to expand rows in desired format.

The solution using df used by @missuse can be shown as:

library(tidyverse)

df %>% gather(key,value,-names) %>%
  mutate(key = ifelse(key=="drag", 1, 0)) %>%
  group_by(names,key) %>%
  expand(value = 1:value) %>%
  select(names, value = key) %>%
  as.data.frame()

#     names value
# 1       A     0
# 2       A     0
# 3       A     1
# 4       A     1
# 5       A     1
# 6       A     1
# 7       A     1
# 8       A     1
# 9       A     1
# 10      A     1
# ...so on
# 117     E     1
# 118     E     1
# 119     E     1
# 120     E     1
# 121     E     1
# 122     E     1

Upvotes: 2

missuse
missuse

Reputation: 19716

I am not completely sure what does sampling have to do with this problem? It looks to me like you are looking for untable.

Here is an example

data:

set.seed(1)
no_drag = sample(1:5, 5)
drag = sample(15:25, 5)
df <- data.frame(names =  LETTERS[1:5],
                 drag,
                 no_drag)

  names drag no_drag
1     A   24       2
2     B   25       5
3     C   20       4
4     D   23       3
5     E   15       1

library(reshape)
library(tidyverse)
df %>%
  gather(key, value, 2:3) %>% #convert to long format 
  {untable(.,num = .$value)} %>% #untable by value column
  mutate(value = ifelse(key == "drag", 0, 1)) %>% #convert values to 0/1
  select(-key) %>% #remove unwanted column
  arrange(names) #optional

#part of output
    names value
1       A     0
2       A     0
3       A     0
4       A     0
5       A     0
6       A     0
7       A     0
8       A     0
9       A     0
10      A     0
11      A     0
12      A     0
13      A     0
14      A     0
15      A     0
16      A     0
17      A     0
18      A     0
19      A     0
20      A     0
21      A     0
22      A     0
23      A     0
24      A     0
25      A     1
26      A     1
27      B     0
28      B     0
29      B     0
30      B     0

there are other ways to tackle the problem here is one:

One is like @Frank mentioned in the comment:

df %>%
  gather(key, val, 2:3) %>%
  mutate(v = Map(rep, key == "drag", val)) %>%
  unnest %>%
  select(-key, -val)

Another:

df <- gather(df, key, value, 2:3) 
df <- df[rep(seq_len(nrow(df)), df$value), 1:2]
df$key[df$key == "drag"] <- FALSE
df$key[df$key != "drag"] <- TRUE

Upvotes: 4

Related Questions