Dag
Dag

Reputation: 589

Problems with aggregating and crosstabulating data

My data looks like this, ca 3300 rows of data:

Year     Location     Catch
1991       0313       45100
1989       0711         323
1991       0312        1100
1991       0313       45100
1989       0711         323
1991       0312         400 
1990       0313      101000
1981       0711         623
1999       0312         410
2000       0313      145100
1987       0711         323
1987       1285         770  
....

Years cover the period 1977-2015, and there are approx 500 different locations, not all having data in every year.

I need an output like this, summing up the catch for each cell, tabulated by location (rows) and year (columns):

Location   '1977' '1978' '1979' '1980' '1981' '1982' '1983' ...
0312         456  11100  12560    320   4566      0  12010  ...
0313         121    100   4500    760    112  12050 100100  ...
0711        5500   6500      0   1205   1201    560  90500  ...
0712         325    215    600  10100      0   4500  11050  ...
1285       10005   2700   1900 101000     50   7500   6050  ...
...          ...    ...    ...    ...    ...    ...    ...

I have asked a similar question previously, and I have tried different versions of the solution suggested successfully for that question, but without success. This is a little different and slightly more complex, and I am getting something wrong.

Grateful for any help.

Upvotes: 0

Views: 38

Answers (2)

akrun
akrun

Reputation: 887078

Here is a tidyverse way. After grouping by 'Year', 'Location', get the sum of 'Catch' and then spread from 'long' to 'wide' format

library(tidyverse)
df %>%
    group_by(Year, Location) %>%
    summarise(Catch = sum(Catch)) %>% 
    spread(Year, Catch, fill = 0)
# A tibble: 4 × 8
#  Location `1981` `1987` `1989` `1990` `1991` `1999` `2000`
#*    <int>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1      312      0      0      0      0   1500    410      0
#2      313      0      0      0 101000  90200      0 145100
#3      711    623    323    646      0      0      0      0
#4     1285      0    770      0      0      0      0      0

Upvotes: 1

lmo
lmo

Reputation: 38500

You can probably achieve what you are looking for with xtabs which, below, calculates the sum of each location by year for the table that you posted in your answer.

xtabs(Catch ~ Location + Year, df)
        Year
Location   1981   1987   1989   1990   1991   1999   2000
    312       0      0      0      0   1500    410      0
    313       0      0      0 101000  90200      0 145100
    711     623    323    646      0      0      0      0
    1285      0    770      0      0      0      0      0

data

df <- read.table(header=TRUE, text="Year     Location     Catch
1991       0313       45100
1989       0711         323
1991       0312        1100
1991       0313       45100
1989       0711         323
1991       0312         400 
1990       0313      101000
1981       0711         623
1999       0312         410
2000       0313      145100
1987       0711         323
1987       1285         770 ")

Upvotes: 5

Related Questions