Reputation: 39
I have data that looks like this :
is_severe encoding sn_id
6 1 1 chr1 17689
7 0 2 chr1 17689
8 1 1 chr1 17689
9 1 2 chr1 69511
10 1 2 chr1 69511
11 1 1 chr1 69511
12 0 1 chr1 69511
I performed a statistical test on every "group" of values based on the sn_id column.
this is the function for the statistical test:
catt <-
function(y, x, score = c(0, 1, 2)) {
miss <- unique(c(which(is.na(y)), which(is.na(x))))
n.miss <- length(miss)
if(n.miss > 0) {
y <- y[-miss]
x <- x[-miss]
}
if(!all((y == 0) | (y == 1)))
stop("y should be only 0 or 1.")
if(!all((x == 0) | (x == 1) |(x == 2)))
stop("x should be only 0, 1 or 2.")
ca <- x [y == 1]
co <- x [y == 0]
htca <- table(ca)
htco <- table(co)
A <- matrix(0, 2, 3)
colnames(A) <- c(0, 1, 2)
rownames(A) <- c(0, 1)
A[1, names(htca)] <- htca
A[2, names(htco)] <- htco
ptt <- prop.trend.test(A[1, ], colSums(A), score = score)
p.value = as.numeric(ptt$p.value)
res=p.value
return(res)}
and i performed it on the groups of snp_id using the by function:
send=by(merged_df_normal,merged_df_normal$snp_id, function (merged_df_normal) {catt(merged_df_normal$is_sever_int,merged_df_normal$encoding)})
and got these results for example :
merged_df_normal$snp_id: chr11441806
[1] 0.6274769
---------------------------------------------------------------------
merged_df_normal$snp_id: chr1144192891
[1] NA
i wanted to transform this into a data frame which will look like this:
snp_id pvalue
chr11441806 0.6274769
chr1144192891 NA
I tried this :
do.call(rbind,list(send)
and it returned a matrix that looks like this:
chr11441806 chr1144192891
0.6274769 NA
I had to edit the function after accepting an answer :
catt_2 <-
function(y, x, score = c(0, 1, 2)) {
miss <- unique(c(which(is.na(y)), which(is.na(x))))
n.miss <- length(miss)
if(n.miss > 0) {
y <- y[-miss]
x <- x[-miss]
}
if(!all((y == 0) | (y == 1)))
stop("y should be only 0 or 1.")
if(!all((x == 0) | (x == 1) |(x == 2)))
stop("x should be only 0, 1 or 2.")
ca <- x [y == 1]
co <- x [y == 0]
htca <- table(ca)
htco <- table(co)
A <- matrix(0, 2, 3)
colnames(A) <- c(0, 1, 2)
rownames(A) <- c(0, 1)
A[1, names(htca)] <- htca
A[2, names(htco)] <- htco
ptt <- prop.trend.test(A[1, ], colSums(A), score = score)
res <- list(
chisq = as.numeric(ptt$statistic),
p.value = as.numeric(ptt$p.value)
)
return(res)
}
and now the results are :
send=by(merged_df_normal,merged_df_normal$snp_id, function (merged_df_normal) {catt_2(merged_df_normal$is_sever,merged_df_normal$encoding)})
merged_df_normal$snp_id: chr11007252
$chisq
[1] NA
$p.value
[1] NA
------------------------------------------------------------------------
merged_df_normal$snp_id: chr1100731820
$chisq
[1] 0.9111779
$p.value
[1] 0.3398021
and what I would like it to be is:
snp_id pvalue chisq
chr11441806 0.6274769 0.9111779
chr1144192891 NA NA
the answer:
library(data.table)
setDT(merged_df_normal)
merged_df_normal[,.(p.value=catt(is_sever,encoding)),snp_id]
worked really well for getting just the p.value but is there a way to edit the above answer and add a new column chisq? thank you for the help the previous answer
Upvotes: 0
Views: 82
Reputation: 24832
I believe you can just apply catt()
to each group of sn_id
. Let's say your original data is called df
. Then, you can do the following:
library(data.table)
setDT(df)
df[,.(p.value=catt(is_severe,encoding)),sn_id]
You need to adjust your function so that it handles sn_id
groups that don't have sufficient data; in your example data frame, catt()
only runs without error on sn_id == chr1 69511
..
In general, however, the output will look like this, with one row in the frame for each sn_id
value
sn_id p.value
<char> <num>
1: chr1 69511 0.2482131
Upvotes: 2