Reputation: 59
I have a dataset with multiple countries and I want to create a dummy variable for continents.
My dataset looks like this at the moment:
+---------------+-----------+-----+-----+-----+
| Country | Period | X | Y | Z |
+---------------+-----------+-----+-----+-----+
| Argentina | 1991-1995 | ... | ... | ... |
| Argentina | 1996-2000 | ... | ... | ... |
| Bolivia | 1991-1995 | ... | ... | ... |
| Bolivia | 1996-2000 | ... | ... | ... |
| Brazil | 1991-1995 | ... | ... | ... |
| Brazil | 1996-2000 | ... | ... | ... |
| Canada | 1991-1995 | ... | ... | ... |
| Canada | 1996-2000 | ... | ... | ... |
| United States | 1991-1995 | ... | ... | ... |
| United States | 1996-2000 | ... | ... | ... |
+---------------+-----------+-----+-----+-----+
My desired output is the following:
+---------------+-----------+-----+-----+-----+---------+---------+
| Country | Period | X | Y | Z | dummySA | dummyNA |
+---------------+-----------+-----+-----+-----+---------+---------+
| Argentina | 1991-1995 | ... | ... | ... | 1 | 0 |
| Argentina | 1996-2000 | ... | ... | ... | 1 | 0 |
| Bolivia | 1991-1995 | ... | ... | ... | 1 | 0 |
| Bolivia | 1996-2000 | ... | ... | ... | 1 | 0 |
| Brazil | 1991-1995 | ... | ... | ... | 1 | 0 |
| Brazil | 1996-2000 | ... | ... | ... | 1 | 0 |
| Canada | 1991-1995 | ... | ... | ... | 0 | 1 |
| Canada | 1996-2000 | ... | ... | ... | 0 | 1 |
| United States | 1991-1995 | ... | ... | ... | 0 | 1 |
| United States | 1996-2000 | ... | ... | ... | 0 | 1 |
+---------------+-----------+-----+-----+-----+---------+---------+
So, I want to have a dummy for all countries in South America and a dummy for all countries in North America. I know how to create a dummy for a single country or year but not for multiple values.
Upvotes: 1
Views: 1094
Reputation: 887961
If there are only handful of countrires, create the dummy column with %in%
library(dplyr)
df1 %>%
mutate(dummySA = as.integer(Country %in%
c("Argentina", "Bolivia", "Brazil")),
dummyNA = as.integer(!dummySA))
Otherwise, create a key/val dataset with 'Country' and the geographic area, do a merge/join and create the dummy values by spread
library(tidyr)
df1 %>%
left_join(keyvaldat) %>%
mutate(n = 1) %>%
spread(value, n, fill = 0)
Upvotes: 2