DotPi
DotPi

Reputation: 4127

Create count per item by year/decade

I have data in a data.table that is as follows:

> x<-df[sample(nrow(df), 10),]
> x      

>                   Importer                 Exporter       Date

 1:                 Ecuador                  United Kingdom 2004-01-13
 2:                  Mexico                   United States 2013-11-19
 3:               Australia                   United States 2006-08-11
 4:           United States                   United States 2009-05-04
 5:                   India                   United States 2007-07-16
 6:               Guatemala                       Guatemala 2014-07-02
 7:                  Israel                          Israel 2000-02-22
 8:                   India                   United States 2014-02-11
 9:                    Peru                            Peru 2007-03-26
10:                  Poland                          France 2014-09-15

I am trying to create summaries so that given a time period (say a decade), I can find the number of time each country appears as Importer and Exporter. So, in the above example the desired output when dividing up by decade should be something like:

Decade    Country.Name    Importer.Count         Exporter.Count

2000      Ecuador         1                      0
2000      Mexico          1                      1
2000      Australia       1                      0
2000      United States   1                      3
.
.
.
2010     United States    0                      2
.
.
.

So far, I have tried with aggregate and data.table methods as suggested by the post here, but both of them seem to just give me counts of the number Importers/Exporters per year (or decade as I am more interested in that).

> x$Decade<-year(x$Date)-year(x$Date)%%10
> importer_per_yr<-aggregate(Importer ~ Decade, FUN=length, data=x)
> importer_per_yr

   Decade                      Importer

2   2000                       6
3   2010                       4

Considering that aggregate uses the formula interface, I tried adding another criteria, but got the following error:

> importer_per_yr<-aggregate(Importer~ Decade + unique(Importer), FUN=length, data=x)
Error in model.frame.default(formula = Importer ~ Decade +  : 
  variable lengths differ (found for 'unique(Importer)')

Is there a way to create the summary according to the decade and the importer/ exporter? It does not matter if the summary for importer and exporter are in different tables.

Upvotes: 1

Views: 312

Answers (2)

Mark Miller
Mark Miller

Reputation: 13113

I think with will work with aggregate in base R:

my.data <- read.csv(text = '
        Importer,             Exporter,           Date
         Ecuador,       United Kingdom,     2004-01-13
          Mexico,        United States,     2013-11-19
       Australia,        United States,     2006-08-11
   United States,        United States,     2009-05-04
           India,        United States,     2007-07-16
       Guatemala,            Guatemala,     2014-07-02
          Israel,               Israel,     2000-02-22
           India,        United States,     2014-02-11
            Peru,                 Peru,     2007-03-26
          Poland,               France,     2014-09-15
', header = TRUE, stringsAsFactors = TRUE, strip.white = TRUE)

my.data$my.Date <- as.Date(my.data$Date, format = "%Y-%m-%d")

my.data <- data.frame(my.data,
                 year  = as.numeric(format(my.data$my.Date, format = "%Y")),
                 month = as.numeric(format(my.data$my.Date, format = "%m")),
                 day   = as.numeric(format(my.data$my.Date, format = "%d")))

my.data$my.decade <- my.data$year - (my.data$year %% 10)

importer.count <- with(my.data, aggregate(cbind(count = Importer) ~ my.decade + Importer, FUN = function(x) { NROW(x) }))
exporter.count <- with(my.data, aggregate(cbind(count = Exporter) ~ my.decade + Exporter, FUN = function(x) { NROW(x) }))

colnames(importer.count) <- c('my.decade', 'country', 'importer.count')
colnames(exporter.count) <- c('my.decade', 'country', 'exporter.count')

my.counts <- merge(importer.count, exporter.count, by = c('my.decade', 'country'), all = TRUE)

my.counts$importer.count[is.na(my.counts$importer.count)] <- 0
my.counts$exporter.count[is.na(my.counts$exporter.count)] <- 0

my.counts

#    my.decade        country importer.count exporter.count
# 1       2000      Australia              1              0
# 2       2000        Ecuador              1              0
# 3       2000          India              1              0
# 4       2000         Israel              1              1
# 5       2000           Peru              1              1
# 6       2000  United States              1              3
# 7       2000 United Kingdom              0              1
# 8       2010      Guatemala              1              1
# 9       2010          India              1              0
# 10      2010         Mexico              1              0
# 11      2010         Poland              1              0
# 12      2010  United States              0              2
# 13      2010         France              0              1

Upvotes: 0

akrun
akrun

Reputation: 887501

We can do this using data.table methods, Create the 'Decade' column by assignment :=, then melt the data from 'wide' to 'long' format by specifying the measure columns, reshape it back to 'wide' using dcast and we use the fun.aggregate as length.

x[, Decade:= year(Date) - year(Date) %%10]
dcast(melt(x, measure = c("Importer", "Exporter"), value.name = "Country"), 
                       Decade + Country~variable, length)
#     Decade        Country Importer Exporter
# 1:   2000      Australia        1        0
# 2:   2000        Ecuador        1        0
# 3:   2000          India        1        0
# 4:   2000         Israel        1        1
# 5:   2000           Peru        1        1
# 6:   2000 United Kingdom        0        1
# 7:   2000  United States        1        3
# 8:   2010         France        0        1
# 9:   2010      Guatemala        1        1
#10:   2010          India        1        0
#11:   2010         Mexico        1        0
#12:   2010         Poland        1        0
#13:   2010  United States        0        2

Upvotes: 2

Related Questions