lala
lala

Reputation: 35

Extract elements 10x greater than the last values for multiple columns

I am a new R user. I have a dataframe consisting of 50 columns and 300 rows. The first column indicates the ID while the 2nd until the last column are standard deviation (sd) of traits. The pooled sd for each column are indicated at the last row. For each column, I want to remove all those values ten times greater than the pooled sd. I want to do this in one run. So far, the script below is what I have came up for knowing whether a value is greater than the pooled sd. However, even the ID (character) are being processed (resulting to all FALSE). If I put raw_sd_summary[-1], I have no way of knowing which ID on which trait has the criteria I'm looking for.

 logic_sd <- lapply(raw_sd_summary, function(x) x>tail(x,1) )
 logic_sd_df <- as.data.frame(logic_sd)

What shall I do? And how can I extract all those values labeled as TRUE (greater than pooled sd) that are ten times greater than the pooled SD (along with their corresponding ID's)?

Upvotes: 0

Views: 84

Answers (1)

Assaf Wool
Assaf Wool

Reputation: 64

I think your code won't work since lapply will run on a data.frame's columns, not its rows as you want. Change it to

logic_sd <- apply(raw_sd_summary, 2, function(x) x>10*tail(x,1) )

This will give you a logical array of being more than 10 times the last row. You could recover the IDs by replacing the first column

logic_sd[,1]  <- raw_sd_summary[,1]

You could remove/replace the unwanted values in the original table directly by

raw_sd_summary[-300,-1][logic_sd[-300,-1]]<-NA    # or new value

Upvotes: 0

Related Questions