Eliza Romanski
Eliza Romanski

Reputation: 39

convert the result of by() function (a list) to data frame in R-Edited

I have data that looks like this :

   is_severe     encoding    sn_id
6           1        1   chr1 17689
7           0        2   chr1 17689
8           1        1   chr1 17689
9           1        2   chr1 69511
10          1        2   chr1 69511
11          1        1   chr1 69511
12          0        1   chr1 69511

I performed a statistical test on every "group" of values based on the sn_id column.

this is the function for the statistical test:

catt <-

  function(y, x, score = c(0, 1, 2)) {
    miss <- unique(c(which(is.na(y)), which(is.na(x))))
    n.miss <- length(miss)
    if(n.miss > 0) {
      y <- y[-miss]
      x <- x[-miss]
    }
    if(!all((y == 0) | (y == 1))) 
      stop("y should be only 0 or 1.")
    if(!all((x == 0) | (x == 1) |(x == 2))) 
      stop("x should be only 0, 1 or 2.")
    ca <- x [y == 1]
    co <- x [y == 0]
    htca <- table(ca)
    htco <- table(co)
    A <- matrix(0, 2, 3)
    colnames(A) <- c(0, 1, 2)
    rownames(A) <- c(0, 1)
    A[1, names(htca)] <- htca
    A[2, names(htco)] <- htco
    ptt <- prop.trend.test(A[1, ], colSums(A), score = score) 
    p.value = as.numeric(ptt$p.value) 
    res=p.value
    return(res)}

and i performed it on the groups of snp_id using the by function:

send=by(merged_df_normal,merged_df_normal$snp_id, function (merged_df_normal) {catt(merged_df_normal$is_sever_int,merged_df_normal$encoding)})

and got these results for example :

merged_df_normal$snp_id: chr11441806
[1] 0.6274769
--------------------------------------------------------------------- 
merged_df_normal$snp_id: chr1144192891
[1] NA

i wanted to transform this into a data frame which will look like this:

snp_id                     pvalue
chr11441806                  0.6274769
chr1144192891                 NA

I tried this :

do.call(rbind,list(send)

and it returned a matrix that looks like this:

chr11441806      chr1144192891         
0.6274769          NA

I had to edit the function after accepting an answer :

catt_2 <-
  function(y, x, score = c(0, 1, 2)) {
    miss <- unique(c(which(is.na(y)), which(is.na(x))))
    n.miss <- length(miss)
    if(n.miss > 0) {
      y <- y[-miss]
      x <- x[-miss]
    }
    if(!all((y == 0) | (y == 1))) 
      stop("y should be only 0 or 1.")
    if(!all((x == 0) | (x == 1) |(x == 2))) 
      stop("x should be only 0, 1 or 2.")
    ca <- x [y == 1]
    co <- x [y == 0]
    htca <- table(ca)
    htco <- table(co)
    A <- matrix(0, 2, 3)
    colnames(A) <- c(0, 1, 2)
    rownames(A) <- c(0, 1)
    A[1, names(htca)] <- htca
    A[2, names(htco)] <- htco
    ptt <- prop.trend.test(A[1, ], colSums(A), score = score)
    res <- list(
                chisq = as.numeric(ptt$statistic), 
                 
                p.value = as.numeric(ptt$p.value)
                )
    return(res)
  }

and now the results are :

send=by(merged_df_normal,merged_df_normal$snp_id, function (merged_df_normal) {catt_2(merged_df_normal$is_sever,merged_df_normal$encoding)})


    merged_df_normal$snp_id: chr11007252
$chisq
[1] NA

$p.value
[1] NA

------------------------------------------------------------------------ 
merged_df_normal$snp_id: chr1100731820
$chisq
[1] 0.9111779

$p.value
[1] 0.3398021

and what I would like it to be is:

snp_id                     pvalue                chisq         
    chr11441806                  0.6274769       0.9111779
    chr1144192891                 NA              NA

the answer:

library(data.table)
setDT(merged_df_normal)

merged_df_normal[,.(p.value=catt(is_sever,encoding)),snp_id]

worked really well for getting just the p.value but is there a way to edit the above answer and add a new column chisq? thank you for the help the previous answer

Upvotes: 0

Views: 82

Answers (1)

langtang
langtang

Reputation: 24832

I believe you can just apply catt() to each group of sn_id. Let's say your original data is called df. Then, you can do the following:

library(data.table)
setDT(df)
df[,.(p.value=catt(is_severe,encoding)),sn_id]

You need to adjust your function so that it handles sn_id groups that don't have sufficient data; in your example data frame, catt() only runs without error on sn_id == chr1 69511..

In general, however, the output will look like this, with one row in the frame for each sn_id value

        sn_id   p.value
       <char>     <num>
1: chr1 69511 0.2482131

Upvotes: 2

Related Questions