Reputation: 21
I have a data set which looks like this
Variable A1 A2 A3 B1 B2 B3
Item 1 Men 1 3 3 1 2 3
Item 2 Men 1 3 3 1 2 3
Item 1 Men 1 3 3 1 2 3
Item 3 Men 2 1 2 5 3 3
Item 2 Men 1 3 3 1 2 3
I need to group columns A1, A2, A3
to be A
and B1,B2,B3
to be B
, thereafter I would like to sum the values.
How do I do this in R ?
Upvotes: 2
Views: 194
Reputation: 70256
If you only have to do this for a few variables, you could do:
df$A <- rowSums(df[grep("^A\\d+$", names(df))])
df$B <- rowSums(df[grep("^B\\d+$", names(df))])
df[!names(df) %in% grep("^[A-Z]{1}\\d+$", names(df), value = TRUE)]
df
# Variable Sex A B
#1 Item_1 Men 7 6
#2 Item_2 Men 7 6
#3 Item_1 Men 7 6
#4 Item_3 Men 5 11
#5 Item_2 Men 7 6
(I called the "MEN" column Sex since it seemed like there were too few column names).
The grep()
command I wrote above uses regular expressions. For example, "^A\\d+$"
matches those column names that start with "A" followed by any number of digits. The grep
command returns the indices where those matches were found (in the column names) and therefore the rowSums
will only add up the right columns.
A very similar concept is used in the third line to drop the columns which are of the format: Any single capital letter (A to Z) followed by any number of digits, will be dropped.
Upvotes: 2
Reputation: 886998
If you have many
groups, you could convert the wide
format to long
format and then do the necessary calculation. Here is one way using dplyr
library(dplyr)
library(tidyr)
df$indx <- 1:nrow(df)
df %>%
gather(Var, Val, -Variable,-indx) %>%
extract(Var, c('Var1', 'Var2'), '(.)(.)') %>%
group_by(Variable, indx, Var1) %>%
summarise(Val=sum(Val)) %>%
ungroup() %>%
spread(Var1, Val) %>%
arrange(indx)
# Variable indx A B
#1 Item 1 Men 1 7 6
#2 Item 2 Men 2 7 6
#3 Item 1 Men 3 7 6
#4 Item 3 Men 4 5 11
#5 Item 2 Men 5 7 6
Or another option would be to use Map
with rowSums
indx <- sub('\\d+', '', colnames(df)[-1])
lst <- split(colnames(df)[-1],indx)
df1 <- df[1]
Un <- unique(indx)
df1[Un] <- Map(function(x,y) rowSums(x[y], na.rm=TRUE),
list(df), lst)
df1
# Variable A B
#1 Item 1 Men 7 6
#2 Item 2 Men 7 6
#3 Item 1 Men 7 6
#4 Item 3 Men 5 11
#5 Item 2 Men 7 6
Or if the columns are ordered
ie. A1, A2,A3
followed by B1, B2, B3
etc. and the number of columns per each group is the same. (Change the dim
accordingly)
df1[Un] <- apply(array(as.matrix(df[-1]),
dim=c(nrow(df),3,length(Un))), c(1,3), sum)
df <- structure(list(Variable = c("Item 1 Men", "Item 2 Men", "Item 1 Men",
"Item 3 Men", "Item 2 Men"), A1 = c(1L, 1L, 1L, 2L, 1L), A2 = c(3L,
3L, 3L, 1L, 3L), A3 = c(3L, 3L, 3L, 2L, 3L), B1 = c(1L, 1L, 1L,
5L, 1L), B2 = c(2L, 2L, 2L, 3L, 2L), B3 = c(3L, 3L, 3L, 3L, 3L
)), .Names = c("Variable", "A1", "A2", "A3", "B1", "B2", "B3"
), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 0
Reputation: 158
test.df$A <- test.df$A1 + test.df$A2 + test.df$A3
and the same for B. This code will just create a new variable in your dataframe named A with the calculation.
Upvotes: 0