Reputation: 8106
I have a data.frame like this:
x <- data.frame(names=c('NG_1', 'NG_2', 'FG_1', 'FG_2'), score=c(1,2,3,4), label=c('N','N','F','F'))
x
names score label
1 NG_1 1 N
2 NG_2 2 N
3 FG_1 3 F
4 FG_2 4 F
I want to group the two groups (N,F) by doing a substring match. For example, NG_1
matches with FG_1
. I am looking for my result something like this:
y <- data.frame(name1=c('NG_1','NG_2'), name2=c('FG_1', 'FG_2'), score1=c(1,2), score2=c(3,4))
y
name1 name2 score1 score2
1 NG_1 FG_1 1 3
2 NG_2 FG_2 2 4
The resulting table doesn't need to look exactly like above, but I do want the scores grouped.
The only way I can think of is to run a for-loop over all rows with the label=N
and match each of them to F
. Is there anything better?
Upvotes: 0
Views: 1255
Reputation: 1445
Here is a way using dplyr/tidyr
> require(dplyr)
> require(tidyr)
> x <- data.frame(names=c('NG_1', 'NG_2', 'FG_1', 'FG_2')
+ , score=c(1,2,3,4)
+ , label=c('N','N','F','F')
+ , stringsAsFactors = FALSE
+ )
> x
names score label
1 NG_1 1 N
2 NG_2 2 N
3 FG_1 3 F
4 FG_2 4 F
> # create new 'label' for grouping
> x$label <- substring(x$names, 4, 4) # extract grouping criteria
> x %>%
+ gather(key, value, -label) %>% # wide to long using 'label'
+ group_by(label, key) %>% # group for adding newkey
+ mutate(newkey = paste(key , seq(length(key)), sep = "_")) %>%
+ ungroup %>% # remove grouping criteria
+ select(-key) %>% # remove the 'key' column -- not needed
+ spread(newkey, value) %>% # long to wide
+ select(-label) # remove the 'label' column -- not needed
Source: local data frame [2 x 4]
names_1 names_2 score_1 score_2
(chr) (chr) (chr) (chr)
1 NG_1 FG_1 1 3
2 NG_2 FG_2 2 4
Upvotes: 0
Reputation: 886938
We can do this with data.table
. Convert the 'data.frame' to 'data.table' (setDT(x)
), create a grouping variable ("Grp") and sequence ("N") based on the 'label', then use dcast
(which can take multiple value.var
columns) to convert the 'long' to 'wide' format.
library(data.table)
setDT(x)[, Grp:= .GRP, label]
x[, N:= 1:.N, label]
dcast(x, N~Grp, value.var=c('names', 'score'), sep='')[,N:= NULL][]
# names1 names2 score1 score2
#1: NG_1 FG_1 1 3
#2: NG_2 FG_2 2 4
Upvotes: 1