Reputation: 3
I have looked at similar posts but haven't gotten anything to solve my issue.
I have a dataframe (df) with different sizes of two characteristic (S042: 4 sizes, S081: 3 sizes) as variables for 5 individuals (Sample):
Sample S042_91 S042_93 S042_105 S042_107 S081_184 S081_187 S081_188
1 6001 91 0 0 107 184 187 0
2 6002 0 93 105 0 0 0 188
3 6005 91 0 105 0 0 0 188
4 6006 0 0 0 107 0 187 188
5 6008 91 0 0 107 NA NA NA
I would like to create a new dataframe reorganizing data per characteristic limiting to two columns (since each individual has one or two sizes, no more). If individual has one size, the second column must have a "0". If individual has no sizes (NA), both columns must have "NA". Also, rename characteristic column names to _1 and _2 keeping characteristic name.
New dataframe would end up looking like:
Sample S042_1 S042_2 S081_1 S081_2
1 6001 91 107 184 187
2 6002 93 105 188 0
3 6005 91 105 188 0
4 6006 107 0 187 188
5 6008 91 107 NA NA
Thanks a lot in advance.
Upvotes: 0
Views: 157
Reputation: 887841
Create a vector of unique column names that have different prefixes excluding the first column. This can be done using gsub
by stripping out the characters starting from the _
. Then, loop this using lapply
and grep
this pattern to get the columns that have the same prefix. You can use apply to get the elements that are not 0
for each row followed by 0's.
size <- 1:2
nm1 <- paste0("^", unique(gsub("\\_.*", "", colnames(df)[-1])))
dfN <- cbind(df[,1],do.call(data.frame,
lapply(nm1, function(x) t(apply(df[grep(x, colnames(df))],1,
function(x) c(x[x!=0], x[x==0])[size])))))
colnames(dfN) <- c("Sample", paste(rep(c('SO42', 'SO81'), each=2), 1:2, sep="_"))
dfN
# Sample SO42_1 SO42_2 SO81_1 SO81_2
#1 6001 91 107 184 187
#2 6002 93 105 188 0
#3 6005 91 105 188 0
#4 6006 107 0 187 188
#5 6008 91 107 NA NA
Another way using dplyr/tidyr
library(dplyr)
library(tidyr)
df %>%
gather(Var, Val, -Sample) %>%
separate(Var, c("Var1", "Var2")) %>%
group_by(Sample, Var1) %>%
arrange(Val==0) %>%
slice(1:2) %>%
mutate(n=row_number()) %>%
ungroup() %>%
unite(Var, Var1, n) %>%
select(-Var2) %>%
spread(Var, Val)
# Sample S042_1 S042_2 S081_1 S081_2
#1 6001 91 107 184 187
#2 6002 93 105 188 0
#3 6005 91 105 188 0
#4 6006 107 0 187 188
#5 6008 91 107 NA NA
Upvotes: 2
Reputation:
S4 <- apply(tmp, 1, function (x) sort(x[2:5], decreasing=TRUE)[1:2])
S8 <- apply(tmp, 1, function (x) sort(x[6:8], decreasing=TRUE)[1:2])
newdata <- cbind(t(S4), t(S8))
Untested but should work. Renaming the columns and making it a data frame is left for you to do.
Upvotes: 0