Amaranta_Remedios
Amaranta_Remedios

Reputation: 773

Calculate ratio every two rows with partial string matches

I am trying to calculate a ratio using this formula: log2(_5p/3p).

I have a dataframe in R and the entries have the same name except their last part that will be either _3p or _5p. I want to do this operation log2(_5p/_3p) for each specific name.

For instance for the first two rows the result will be like this:

LQNS02277998.1_30988 log2(40/148)= -1.887525

Ideally I want to create a new data frame with the results where only the common part of the name is kept.

LQNS02277998.1_30988 -1.887525

How can I do this in R?

> head(dup_res_LC1_b_2)
# A tibble: 6 x 2
  microRNAs                                  n
  <chr>                                  <int>
1 LQNS02277998.1_30988_3p                  148
2 LQNS02277998.1_30988_5p                   40
3 Dpu-Mir-279-o6_LQNS02278070.1_31942_3p     4
4 Dpu-Mir-279-o6_LQNS02278070.1_31942_5p     4
5 LQNS02000138.1_777_3p                     73
6 LQNS02000138.1_777_5p                     12


structure(list(microRNAs = c("LQNS02277998.1_30988_3p", 
"LQNS02277998.1_30988_5p", "Dpu-Mir-279-o6_LQNS02278070.1_31942_3p", 
"Dpu-Mir-279-o6_LQNS02278070.1_31942_5p", "LQNS02000138.1_777_3p", 
"LQNS02000138.1_777_5p"), n = c(148L, 40L, 4L, 4L, 73L, 12L)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Views: 101

Answers (1)

akrun
akrun

Reputation: 887078

We can use a group by operation by removing the substring at the end i.e. _3p or _5p with str_remove, then use the log division of the pair of 'n'

library(dplyr)
library(stringr)
df1 %>% 
   group_by(grp = str_remove(microRNAs, "_[^_]+$")) %>% 
   mutate(new = log2(last(n)/first(n)))

Upvotes: 1

Related Questions