Eric Nilsen
Eric Nilsen

Reputation: 101

Attempting to create panel-data from cross sectional data

I'm attempting to transform data from the Global Terrorism Database so that instead of the unit being terror events, it will be "Country_Year" with one variable having the number of terror events that year.

I've managed to create a dataframe that has all one column with all the Country_Year combinations as one variable. I've also find that by using ` ´table(GTD_94_Land$country_txt, GTD_94_Land$iyear)´ the table shows the values that I would like the new variable to have. What I can't figure out is how to store this number as a variable.

So my data look like this

        eventid iyear crit1 crit2 crit3 country country_txt
      <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>      
 1 199401010008  1994     1     1     1     182 Somalia    
 2 199401010012  1994     1     1     1     209 Turkey     
 3 199401010013  1994     1     1     1     209 Turkey     
 4 199401020003  1994     1     1     1     209 Turkey     
 5 199401020007  1994     1     1     0     106 Kuwait     
 6 199401030002  1994     1     1     1     209 Turkey     
 7 199401030003  1994     1     1     1     228 Yemen      
 8 199401030006  1994     1     1     0      53 Cyprus     
 9 199401040005  1994     1     1     0     209 Turkey     
10 199401040006  1994     1     1     0     209 Turkey     
11 199401040007  1994     1     1     1     209 Turkey     
12 199401040008  1994     1     1     1     209 Turkey 

and I would like to transform so that I had

Terror attacks iyear crit1 crit2 crit3 country country_txt
          <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>      
 1 1  1994     1     1     1     182 Somalia    
 2 8  1994     1     1     1     209 Turkey     
 5 1  1994     1     1     0     106 Kuwait     
  7 1  1994    1     1     1     228 Yemen      
 8 1  1994     1     1     0      53 Cyprus     
´´´

I've looked at some solutions but most of them seems to assume that the number the new variable should have already is in the data. 

All help is appreciated!

Upvotes: 1

Views: 305

Answers (2)

ClancyStats
ClancyStats

Reputation: 1231

Here's a data.table solution. If the data set has already been filtered to have crit1 and crit2 equal to 1 (which you gave as a condition in a comment), you can remove the first argument (crit1 == 1 & crit2 == 1)

library(data.table)
set.seed(1011)

dat <- data.table(eventid = round(runif(100, 1000, 10000)),
                  iyear = sample(1994:1996, 100, rep = T),
                  crit1 = rbinom(100, 1, .9),
                  crit2 = rbinom(100, 1, .9),
                  crit3 = rbinom(100, 1, .9),
                  country = sample(1:3, 100, rep = T))
dat[, country_txt := LETTERS[country]]

## remove crit variables
dat[crit1 == 1 & crit2 == 1, .N, .(country, country_txt, iyear)]
#>    country country_txt iyear  N
#> 1:       1           A  1994 10
#> 2:       1           A  1995  4
#> 3:       3           C  1995 10
#> 4:       1           A  1996  7
#> 5:       2           B  1996  9
#> 6:       3           C  1996  5
#> 7:       2           B  1994  8
#> 8:       3           C  1994 13
#> 9:       2           B  1995 10

Created on 2019-09-24 by the reprex package (v0.3.0)

Upvotes: 0

makeshift-programmer
makeshift-programmer

Reputation: 499

Assuming df is the original dataframe:

df_out = df %>% 
  dplyr::select(-eventid) %>% 
  dplyr::group_by(country_txt,iyear) %>% 
  dplyr::mutate(Terrorattacs = n()) %>% 
  dplyr::slice(1L) %>% 
  dplyr::ungroup()

Ideally, I would use summarise but since I don't know the summarising criteria for other columns, I have simply used mutate and slice.

Note: The 'crit' columns values would be the first occurrence of the 'country_txt' and 'iyear'.

Upvotes: 1

Related Questions