Reputation: 939
I find myself writing a piece of code in a certain way but lately I started to wonder if I could make it better and more readable. My thought went to tidyverse.
Let me explain:
set.seed(123)
a_df=data.frame(sample=sample(c("A","B", "C"), 50, replace=TRUE),
type= paste0(sample(letters[1:3], 50, replace=TRUE), sample(letters[1:3],50, replace=TRUE)),
area=sample(1:100, 50, replace=TRUE) )
I have a dataframe similar to the one in the example. It has a column called "sample", one called "type" and another "area".
head(a_df)
sample type area
1 C ac 9
2 C ab 71
3 C cb 98
4 B ac 48
5 C ba 77
6 B aa 83
I need to compute some values with the third column for each sample but only for some specific types.
I define my types in "targets" and I use a double for loop.
Looping on the targets and on "sample".
NOTE: I use grep to select the lines in the temporary data frame "tdf". In my code, this passage is more complex. Instead of "grep" I have a function that takes 2 parameters, the target (a_targ) and the column "type" of "tdf" data frame. This passage with a function of 2 parameters must be conserved. I will generalize the answer with my function.
The idea of the computation is to count how many entries of a given "targets" type there are in a "sample" and divide this number by the sum of the "area" for these entries.
targets=c("ab", "bb")
all_densities=NULL
for(a_targ in targets){
for(i in unique(a_df$sample)){
tdf=a_df[a_df$sample==i, ]
tdf=tdf[grep(a_targ, tdf$type),]
a_dens=nrow(tdf)/sum(tdf$area)
df_res=data.frame(sample=i, type=a_targ, density=a_dens)
all_densities=rbind(all_densities, df_res)
}
}
> head(all_densities)
sample type density
1 C ab 0.01435407
2 B ab 0.02500000
3 A ab 0.01117318
4 C bb 0.02068966
5 B bb NaN
6 A bb 0.03658537
For instance, for sample "A" we can interrogate the data frame as follows:
a_df[a_df$sample=="A" & a_df$type=="bb",]
sample type area
10 A bb 1
21 A bb 67
44 A bb 14
The "density" is 3 (number of rows) divided by "1+67+14", and the results corresponds to 0.03658537 reported in "all_densities".
Would anybody be able to rewrite it with pipes, group_by, and in general in a more tidy way?
Upvotes: 1
Views: 63
Reputation: 7297
You can do it with group_by
and mutate
(the filtering and arranging after ungroup
is just to achieve comparable output to yours):
library(dplyr)
a_df |>
group_by(sample, type) |>
summarise(a_dens = n()/sum(area)) |>
ungroup() |>
filter(type %in% c("ab", "bb")) |>
arrange(type, desc(sample))
Output:
# A tibble: 5 × 3
sample type a_dens
<chr> <chr> <dbl>
1 C ab 0.0144
2 B ab 0.025
3 A ab 0.0112
4 C bb 0.0207
5 A bb 0.0366
Update with after OP update / change of formula.
Upvotes: 2