Reputation: 39
The question title might not completely reflect my problem, and that's perhaps the reason why I cannot come up with a solution for my problem. I have read simmilar questions (e.g., Assign a value to column based on condition across rows or R: Generate a dummy variable based on the existence of one column' value in another column) and on-line guides on creating dummy variables in R (I'm quite new to R), but no-one seems to tackle my problem, or perhaps I just couldn't see how. I have a dataframe like this:
df <- data.frame("Country" = c("US", "US", "US", "US", "IT", "IT", "IT","FR","FR"),
"Time" = c(1, 1, 2, 3, 1, 2, 1, 2, 3))
The dataframe is more complex, but each row is an observation of one country at a given moment in time. I want to create a dummy variable that takes the value 1 for an observation if the country to which this observation is assigned was measured at time 1, 2 3 and 0 otherwise and another dummy that takes the value 1 if the country was measured at the moments 2 and 3 but not one, and 0 otherwise. So the dataframe would look like:
df2 <- data.frame("Country" = c("US", "US", "US", "US", "IT", "IT", "IT","FR","FR"),
"Time" = c(1, 1, 2, 3, 1, 2, 1, 2, 3),
"DummyTime123" = c(1, 1, 1, 1, 0, 0, 0, 0, 0),
"DummyTime23" = c(0, 0, 0, 0, 0, 0, 0, 1, 1))
So, since US was measured at times 1, 2 and 3, american observations take the value 1, and the rest of the observations take the value 0. However, since FR was measured at times 2 and 3, French observations take the value 1, and the rest take the value 0 (note that also US takes the value 0, because it was measured at times 1, 2 and 3, and not only at times 2 and 3).
I have tried to create a dummy for instance with if_else within the tidyverse, like
DummyTime123 = ifelse(country = country, time = 1 & time = 2 & time = 3)
But this does not work, quite reasonably, because no single observation is measured at time 1, 2 and 3. Instead, I want to create a dummy for that observation based on whether the value of one column for this observation (country) is measured at several (and specific) times. I have also considered that my dataframe could be untidy, but I cannot see how and I don't think that's the problem. Of course, I could do this manually (that's what I did so far), but since the dataset is quite large, I would like to find an automated solution.
¿Does anybody have a solution for this problem? It would be really nice if there was a solution for this within the tidyverse, but of course any solution would be helpful.
Upvotes: 2
Views: 1749
Reputation: 30474
With tidyverse
you could try the following.
Use group_by
with Country
to consider all the Time
values within each Country
.
To satisfy DummyTime123
criteria, you need all
values of 1, 2, and 3 in the Time
values within a Country
. If TRUE
, then using +
this becomes 1.
For DummyTime23
, it sounds like you want both 2 and 3 in Time
but do not want any
values of Time
to be 1. Using &
you can make sure both criteria are satisfied.
Let me know if this provides the results expected.
library(tidyverse)
df %>%
group_by(Country) %>%
mutate(DummyTime123 = +all(1:3 %in% Time),
DummyTime23 = +(all(2:3 %in% Time) & !any(Time == 1)))
Output
Country Time DummyTime123 DummyTime23
<chr> <dbl> <int> <int>
1 US 1 1 0
2 US 1 1 0
3 US 2 1 0
4 US 3 1 0
5 IT 1 0 0
6 IT 2 0 0
7 IT 1 0 0
8 FR 2 0 1
9 FR 3 0 1
Upvotes: 2