Reputation: 3088
I have a huge data frame that looks like this
gene=c("A","A","A","A","B","B")
frequency=c(abs(rnorm(6,0.5,1)))
time=c(1,2,3,4,1,2)
df <- data.frame(gene,frequency,time)
gene frequency time
1 A 0.08463914 1
2 A 1.55639512 2
3 A 1.24172246 3
4 A 0.75038980 4
5 B 1.13189855 1
6 B 0.56896895 2
For the gene B I have data only for the time points 1 and 2. I want to fill the data of time point 3 and 4 with zeros so as my data look like this
gene frequency time
1 A 0.08463914 1
2 A 1.55639512 2
3 A 1.24172246 3
4 A 0.75038980 4
5 B 1.13189855 1
6 B 0.56896895 2
7 B 0 3
8 B 0 4
Overall I have multiple groups (aka genes) that I want to do this for. Any help or hint are highly appreciated.
Upvotes: 4
Views: 1448
Reputation: 78927
akrun's answer is best! Here is a way with pivoting:
In essence: during the pivoting procedure NA
were produced, these could be than replaced by 0
:
library(tidyr)
library(dplyr)
df %>%
pivot_wider(
names_from = gene,
values_from = frequency
) %>%
pivot_longer(2:3,
names_to = "gene",
values_to = "frequency") %>%
mutate(frequency = replace_na(frequency, 0)) %>%
arrange(gene, time) %>%
select(-time, time)
gene frequency time
<chr> <dbl> <dbl>
1 A 1.00 1
2 A 0.413 2
3 A 0.539 3
4 A 1.08 4
5 B 0.473 1
6 B 1.79 2
7 B 0 3
8 B 0 4
Upvotes: 2
Reputation: 887118
We can use complete
library(dplyr)
library(tidyr)
df %>%
complete(gene, time = 1:4, fill = list(frequency = 0)) %>%
select(names(df))
-output
# A tibble: 8 x 3
gene frequency time
<chr> <dbl> <dbl>
1 A 0.590 1
2 A 0.762 2
3 A 0.336 3
4 A 0.437 4
5 B 0.904 1
6 B 1.97 2
7 B 0 3
8 B 0 4
Upvotes: 9