Reputation: 105
I have a dataframe in R that looks something like this:
df <- paste0(c(letters[seq( from = 1, to = 5 )]),":",round(runif(5),2))
df <- as.data.frame(t(df))
df2 <- paste0(c(letters[seq( from = 1, to = 5 )]),":",round(runif(5),2))
df2 <- as.data.frame(t(df2))
df3 <- paste0(c(letters[seq( from = 1, to = 5 )]),":",round(runif(5),2))
df3 <- as.data.frame(t(df3))
df <- rbind(df, setNames(sample(df2), names(df2)))
df <- rbind(df, setNames(sample(df3), names(df3)))
df
V1 V2 V3 V4 V5
1 a:0.21 b:0.49 c:0.96 d:0.95 e:0.03
2 b:0.33 c:0.25 e:0.81 d:0.94 a:0.02
3 c:0.34 e:0.26 a:0.31 d:0.55 b:0.63
The characters before the colon reflects the type of measurement and the number after the colon reflect the measurement itself.
I would like to resort each row so that each type of measurement (i.e., characters before the colon) is in the same column. It should look like this:
V1 V2 V3 V4 V5
1 a:0.21 b:0.49 c:0.96 d:0.95 e:0.03
2 a:0.02 b:0.33 c:0.25 d:0.94 e:0.81
3 a:0.31 b:0.63 c:0.34 d:0.55 e:0.26
Or better yet:
a b c d e
1 0.21 0.49 0.96 0.95 0.03
2 0.02 0.33 0.25 0.94 0.81
3 0.31 0.63 0.34 0.55 0.26
Any ideas on how I could reorder each row so that the characters before the colon are the same in a given column?
Any advice would be appreciated!
Upvotes: 4
Views: 169
Reputation: 388797
We can get data in long format, get data before colon and after colon into separate
columns and get data in wide format again.
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row, names_to = 'col') %>%
separate(value, into = c('col', 'value'), sep = ":", convert = TRUE) %>%
pivot_wider(names_from = col, values_from = value) %>%
select(-row)
# A tibble: 3 x 5
# a b c d e
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0.1 0.61 0.53 0.82 0.21
#2 0.62 0.93 0.18 0.39 0.34
#3 0.94 0.95 0.41 0.74 0.9
Upvotes: 2
Reputation: 72563
You could use substr
.
res <- as.data.frame(t(apply(df, 1, function(x) x[order(substr(x, 1, 1))])))
res
# V1 V2 V3 V4 V5
# 1 a:0.96 b:0.94 c:0.34 d:0.85 e:0.2
# 2 a:0.84 b:0.32 c:0.78 d:0.67 e:0.32
# 3 a:0.59 b:0.82 c:0.79 d:0.7 e:0.2
Or, to get rid of the suffixes and convert to numeric:
res <- as.data.frame(t(apply(df, 1, function(x)
as.numeric(as.character(substring(x, 3)[order(substr(x, 1, 1))])))))
res
# V1 V2 V3 V4 V5
# 1 0.96 0.94 0.34 0.85 0.20
# 2 0.84 0.32 0.78 0.67 0.32
# 3 0.59 0.82 0.79 0.70 0.20
str(res)
# 'data.frame': 3 obs. of 5 variables:
# $ V1: num 0.96 0.84 0.59
# $ V2: num 0.94 0.32 0.82
# $ V3: num 0.34 0.78 0.79
# $ V4: num 0.85 0.67 0.7
# $ V5: num 0.2 0.32 0.2
Or, using regex:
as.data.frame(t(apply(df2, 1, function(x) {
g1 <- gsub(x, pattern="(\\w+\\:).*", r="\\1")
g2 <- gsub(x, pattern="\\w+\\:(.*)", r="\\1")
as.numeric(as.character(g2[order(g1)]))
})))
# V1 V2 V3 V4 V5
# 1 0.96 0.94 0.34 0.85 0.2
# 2 0.84 0.32 0.78 0.67 0.32
# 3 0.59 0.82 0.79 0.7 0.2
Data for df2
df2 <- df
df2[] <- lapply(df2, function(x) as.character(x))
df2[2, ] <- c("cc:0.78", "e:0.32", "dd:0.67", "a:0.84", "bb:0.32")
df2
# V1 V2 V3 V4 V5
# 1 a:0.96 b:0.94 c:0.34 d:0.85 e:0.2
# 2 cc:0.78 e:0.32 dd:0.67 a:0.84 bb:0.32
# 3 d:0.7 b:0.82 e:0.2 a:0.59 c:0.79
Upvotes: 3