Cris_91
Cris_91

Reputation: 39

How can I sum a column of multiple datasets that do not have the same structure?

I have several files (20) with the same column structure but with different rows structure. All are composed of two columns, the first are factors and the second are integers. I want to sum the column of integers for the factors that are repeated and the new ones that are simply added. How could I combine and sum the ones already repeated?

I've thought about combining cbind and tapply but I really do not know how to carry this out.

A simple example of the structure of the files:

Shop   Clients     Shop  Clients     Shop Clients
 A        9          D      8          A     5
 B        7          A      4          R     4
 C        4          F      3          C     3
 D        2          B      1          B     2

I expect the output:

Shop Clients
A      18 
B      10
C       7
D      10
F       3
R       4

I read the different files in a loop, creating a dataset for each of them, so that the dataset shows City1$Shop and City1$Clients for example. This case is only for 20 files but I would like to know how to work with more (for example 100). How could I solve this problem by reading the datasets in this way?

f<-function(x){
  read.delim2("p01.txt",header=T,sep="\t",stringsAsFactors = FALSE)
}
for(i in x){
total<-f(i)
#Here I suppose I would combine and sum the datasets
}

Upvotes: 2

Views: 791

Answers (3)

Rui Barradas
Rui Barradas

Reputation: 76621

A possibility is to use aggregate after reshaping the data to a long format.

inx <- grep("Shop", names(df1))
long <- do.call(rbind, lapply(inx, function(i) df1[i:(i + 1)]))
aggregate(Clients ~ Shop, long, sum)
#  Shop Clients
#1    A      18
#2    B      10
#3    C       7
#4    D      10
#5    F       3
#6    R       4

Edit.
After the edit to the question, I believe the following does what is asked. I will once again use aggregate.

fnames <- list.files(pattern = "\\.txt")
df_list <- lapply(fnames, read.table, header = TRUE)
df_all <- do.call(rbind, df_list)
aggregate(Clients ~ Shop, data = df_all, sum)

Upvotes: 1

akrun
akrun

Reputation: 887691

We can melt the data into 'long' format by specifying the measure columns with patterns of column names as 'Shop' 'Clients', then grouped by 'Shop' get the sum of 'Clients'

library(data.table)
melt(setDT(df1), measure = patterns("^Shop", "^Clients"), 
  value.name = c("Shop", "Clients"))[, .(Clients = sum(Clients)), by = Shop]
#    Shop Clients
#1:    A      18
#2:    B      10
#3:    C       7
#4:    D      10
#5:    F       3
#6:    R       4

Or using tidyverse

library(tidyverse)
map_dfc(list(Shop = "Shop", Clients = "Clients"), ~
    df1 %>%
       select(matches(.x)) %>% 
       unlist) %>% 
  group_by(Shop) %>% 
  summarise(Clients = sum(Clients))
# A tibble: 6 x 2
#  Shop  Clients
#  <chr>   <int>
#1 A          18
#2 B          10
#3 C           7
#4 D          10
#5 F           3
#6 R           4

Or with rowsum from base R

i1 <- grepl("^Shop", names(df1))
rowsum(unlist(df1[!i1]), group =  unlist(df1[i1]))

data

df1 <- structure(list(Shop = c("A", "B", "C", "D"), Clients = c(9L, 
 7L, 4L, 2L), Shop.1 = c("D", "A", "F", "B"), Clients.1 = c(8L, 
 4L, 3L, 1L), Shop.2 = c("A", "R", "C", "B"), Clients.2 = 5:2), 
  class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 40171

One tidyverse possibility could be:

df %>%
 select_at(vars(contains("Shop"))) %>%
 gather(var1, val1) %>%
 bind_cols(df %>%
 select_at(vars(contains("Client"))) %>%
 gather(var2, val2)) %>%
 group_by(Shop = val1) %>%
 summarise(Clients = sum(val2))

  Shop  Clients
  <chr>   <int>
1 A          18
2 B          10
3 C           7
4 D          10
5 F           3
6 R           4

The same with base R:

long_df <- data.frame(Shop = stack(df[, grepl("Shop", names(df))])[, 1], 
Clients = stack(df[, grepl("Client", names(df))])[, 1])
aggregate(Clients ~ Shop, long_df, sum)

  Shop Clients
1    A      18
2    B      10
3    C       7
4    D      10
5    F       3
6    R       4

Upvotes: 1

Related Questions