Reputation: 1667
I have two dataframes one with a corr matrix and one with the corresponding p-values (i.e. both are ordered, each position in each dataframe corresponds to the same variable). I would like to filter elements in the corr dataframe based on 2 conditions:
1) compare each column to the element in the first row of that column, elements smaller should become NA or 0.
2) if the corresponding element in the 2nd dataframe is larger than say 0.05, the element should again become 0.
An example would look like this:
set.seed(2)
a = sample(runif(5) ,rep=TRUE)
b = sample(runif(5) ,rep=TRUE)
c = sample(runif(5) ,rep=TRUE)
corr_mat = data.frame(a,b,c)
a = sample(runif(5,0,0.1) ,rep=TRUE)
b = sample(runif(5,0,0.1) ,rep=TRUE)
c = sample(runif(5,0,0.1) ,rep=TRUE)
p_values= data.frame(a,b,c)
So I would like to subset corr_mat in a way that in column a only those value remain that are larger then the row 1 in column a AND the corresponding p-value in p_values is smaller than 0.05.
This is the output I would like to have for these values:
> corr_mat
a b c
1 0.9438393 0.4052822 0.8368892
2 0.1848823 0.4052822 0.6618988
3 0.9438393 0.2388948 0.3875495
4 0.5733263 0.7605133 0.3472722
5 0.5733263 0.5526741 0.6618988
> p_values
a b c
1 0.086886104 0.01632009 0.02754012
2 0.051428176 0.09440418 0.09297202
3 0.016464224 0.02970107 0.02754012
4 0.086886104 0.01150841 0.09297202
5 0.001041453 0.09440418 0.09297202
Target output (based on 1st condition, bigger than or equal to first row value for each column):
> corr_mat
a b c
1 0.9438393 0.4052822 0.8368892
2 0.4052822
3 0.9438393
4 0.7605133
5 0.5526741
Target output (based on both conditions- now excluding corresponding the p-values larger than 0.05):
> corr_mat
a b c
1 0.9438393 0.4052822 0.8368892
2
3 0.9438393
4 0.7605133
5
I was thinking something along the lines of:
apply(corr_mat_df,2, comp)
where comp is defined as something that compares row 1 of column a in corr_mat AND the corresponding element in p_values.
comp<-function(df1,df2) {
for (i in 1:length(df1)) {
if (df[i]<df[1] & df2[i]>0.05){
df[i]=NA
}
}
}
Upvotes: 1
Views: 305
Reputation: 886948
We could also do by
corr_mat *NA^(corr_mat < corr_mat[1,][col(corr_mat)] | p_values > 0.05 )
# a b c
#1 NA 0.4052822 0.8368892
#2 NA NA NA
#3 0.9438393 NA NA
#4 NA 0.7605133 NA
#5 NA NA NA
Or just assign
corr_mat[corr_mat < corr_mat[1,][col(corr_mat)] | p_values > 0.05] <- NA
Upvotes: 1
Reputation: 388817
We can use mapply
to apply both the conditions in one go using replace
. We replace
the values which satisfies one of the condition with NA
.
mapply(function(x, y) replace(x, (x < x[1]) | (y > 0.05), NA),corr_mat, p_values)
# a b c
#[1,] NA 0.4052822 0.8368892
#[2,] NA NA NA
#[3,] 0.9438393 NA NA
#[4,] NA 0.7605133 NA
#[5,] NA NA NA
Upvotes: 2