Reputation: 39
I have several files (20) with the same column structure but with different rows structure. All are composed of two columns, the first are factors and the second are integers. I want to sum the column of integers for the factors that are repeated and the new ones that are simply added. How could I combine and sum the ones already repeated?
I've thought about combining cbind and tapply but I really do not know how to carry this out.
A simple example of the structure of the files:
Shop Clients Shop Clients Shop Clients
A 9 D 8 A 5
B 7 A 4 R 4
C 4 F 3 C 3
D 2 B 1 B 2
I expect the output:
Shop Clients
A 18
B 10
C 7
D 10
F 3
R 4
I read the different files in a loop, creating a dataset for each of them, so that the dataset shows City1$Shop and City1$Clients for example. This case is only for 20 files but I would like to know how to work with more (for example 100). How could I solve this problem by reading the datasets in this way?
f<-function(x){
read.delim2("p01.txt",header=T,sep="\t",stringsAsFactors = FALSE)
}
for(i in x){
total<-f(i)
#Here I suppose I would combine and sum the datasets
}
Upvotes: 2
Views: 791
Reputation: 76621
A possibility is to use aggregate
after reshaping the data to a long format.
inx <- grep("Shop", names(df1))
long <- do.call(rbind, lapply(inx, function(i) df1[i:(i + 1)]))
aggregate(Clients ~ Shop, long, sum)
# Shop Clients
#1 A 18
#2 B 10
#3 C 7
#4 D 10
#5 F 3
#6 R 4
Edit.
After the edit to the question, I believe the following does what is asked. I will once again use aggregate
.
fnames <- list.files(pattern = "\\.txt")
df_list <- lapply(fnames, read.table, header = TRUE)
df_all <- do.call(rbind, df_list)
aggregate(Clients ~ Shop, data = df_all, sum)
Upvotes: 1
Reputation: 887691
We can melt
the data into 'long' format by specifying the measure
columns with patterns
of column names as 'Shop' 'Clients', then grouped by 'Shop' get the sum
of 'Clients'
library(data.table)
melt(setDT(df1), measure = patterns("^Shop", "^Clients"),
value.name = c("Shop", "Clients"))[, .(Clients = sum(Clients)), by = Shop]
# Shop Clients
#1: A 18
#2: B 10
#3: C 7
#4: D 10
#5: F 3
#6: R 4
Or using tidyverse
library(tidyverse)
map_dfc(list(Shop = "Shop", Clients = "Clients"), ~
df1 %>%
select(matches(.x)) %>%
unlist) %>%
group_by(Shop) %>%
summarise(Clients = sum(Clients))
# A tibble: 6 x 2
# Shop Clients
# <chr> <int>
#1 A 18
#2 B 10
#3 C 7
#4 D 10
#5 F 3
#6 R 4
Or with rowsum
from base R
i1 <- grepl("^Shop", names(df1))
rowsum(unlist(df1[!i1]), group = unlist(df1[i1]))
df1 <- structure(list(Shop = c("A", "B", "C", "D"), Clients = c(9L,
7L, 4L, 2L), Shop.1 = c("D", "A", "F", "B"), Clients.1 = c(8L,
4L, 3L, 1L), Shop.2 = c("A", "R", "C", "B"), Clients.2 = 5:2),
class = "data.frame", row.names = c(NA, -4L))
Upvotes: 1
Reputation: 40171
One tidyverse
possibility could be:
df %>%
select_at(vars(contains("Shop"))) %>%
gather(var1, val1) %>%
bind_cols(df %>%
select_at(vars(contains("Client"))) %>%
gather(var2, val2)) %>%
group_by(Shop = val1) %>%
summarise(Clients = sum(val2))
Shop Clients
<chr> <int>
1 A 18
2 B 10
3 C 7
4 D 10
5 F 3
6 R 4
The same with base R
:
long_df <- data.frame(Shop = stack(df[, grepl("Shop", names(df))])[, 1],
Clients = stack(df[, grepl("Client", names(df))])[, 1])
aggregate(Clients ~ Shop, long_df, sum)
Shop Clients
1 A 18
2 B 10
3 C 7
4 D 10
5 F 3
6 R 4
Upvotes: 1