Reputation: 101
I'm attempting to transform data from the Global Terrorism Database so that instead of the unit being terror events, it will be "Country_Year" with one variable having the number of terror events that year.
I've managed to create a dataframe that has all one column with all the Country_Year
combinations as one variable. I've also find that by using `
´table(GTD_94_Land$country_txt, GTD_94_Land$iyear)´ the table shows the values that I would like the new variable to have. What I can't figure out is how to store this number as a variable.
So my data look like this
eventid iyear crit1 crit2 crit3 country country_txt
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 199401010008 1994 1 1 1 182 Somalia
2 199401010012 1994 1 1 1 209 Turkey
3 199401010013 1994 1 1 1 209 Turkey
4 199401020003 1994 1 1 1 209 Turkey
5 199401020007 1994 1 1 0 106 Kuwait
6 199401030002 1994 1 1 1 209 Turkey
7 199401030003 1994 1 1 1 228 Yemen
8 199401030006 1994 1 1 0 53 Cyprus
9 199401040005 1994 1 1 0 209 Turkey
10 199401040006 1994 1 1 0 209 Turkey
11 199401040007 1994 1 1 1 209 Turkey
12 199401040008 1994 1 1 1 209 Turkey
and I would like to transform so that I had
Terror attacks iyear crit1 crit2 crit3 country country_txt
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 1994 1 1 1 182 Somalia
2 8 1994 1 1 1 209 Turkey
5 1 1994 1 1 0 106 Kuwait
7 1 1994 1 1 1 228 Yemen
8 1 1994 1 1 0 53 Cyprus
´´´
I've looked at some solutions but most of them seems to assume that the number the new variable should have already is in the data.
All help is appreciated!
Upvotes: 1
Views: 305
Reputation: 1231
Here's a data.table
solution. If the data set has already been filtered to have crit1
and crit2
equal to 1 (which you gave as a condition in a comment), you can remove the first argument (crit1 == 1 & crit2 == 1
)
library(data.table)
set.seed(1011)
dat <- data.table(eventid = round(runif(100, 1000, 10000)),
iyear = sample(1994:1996, 100, rep = T),
crit1 = rbinom(100, 1, .9),
crit2 = rbinom(100, 1, .9),
crit3 = rbinom(100, 1, .9),
country = sample(1:3, 100, rep = T))
dat[, country_txt := LETTERS[country]]
## remove crit variables
dat[crit1 == 1 & crit2 == 1, .N, .(country, country_txt, iyear)]
#> country country_txt iyear N
#> 1: 1 A 1994 10
#> 2: 1 A 1995 4
#> 3: 3 C 1995 10
#> 4: 1 A 1996 7
#> 5: 2 B 1996 9
#> 6: 3 C 1996 5
#> 7: 2 B 1994 8
#> 8: 3 C 1994 13
#> 9: 2 B 1995 10
Created on 2019-09-24 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 499
Assuming df is the original dataframe:
df_out = df %>%
dplyr::select(-eventid) %>%
dplyr::group_by(country_txt,iyear) %>%
dplyr::mutate(Terrorattacs = n()) %>%
dplyr::slice(1L) %>%
dplyr::ungroup()
Ideally, I would use summarise but since I don't know the summarising criteria for other columns, I have simply used mutate and slice.
Note: The 'crit' columns values would be the first occurrence of the 'country_txt' and 'iyear'.
Upvotes: 1