Reputation: 313
I want to work with 4 national football teams (England, Belgium, Germany and France) and n dates
Date Matches
16.03 England X Brazil
16.03 Belgium X Argentina
16.03 Chile X Japan
16.03 Uruguay X Germany
16.03 Italy x France
17.03 South Korea X India
17.03 Germany X France
17.03 Poland X Belgium
17.03 Colombia X Russia
18.03 South Africa X Mexico
18.03 China X Japon
18.03 Brazil X Venezuela
... ...
DESIRED DATAFRAME with dummies. When dummy = 1 the team played. When dummy = 0 the team didnt played (in the day). Important: Only one date per row.
Date Dummy_england Dummy_belgium Dummy_germany Dummy_France
16.03 1 1 1 1
17.03 0 1 1 1
18.03 0 0 0 0
Thank you very much!!
Upvotes: 3
Views: 80
Reputation: 44
Dummy_england <- tapply(Matches, Date, function(x) {
grepl("England", paste(x, collapse = ''))*1
})
Upvotes: 0
Reputation: 887118
We can use tidyverse
methods.
str_extract
filter
pivot_wider
to reshape from long to 'wide' after select
ing the columns of interest - specify the values_fn
as length
and values_fill
as 0 to change the default NA
to 0library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
mutate(team = str_c('Dummy_', str_extract(Matches,
regex('England|Belgium|Germany|France', ignore_case = TRUE)))) %>%
filter(complete.cases(team)) %>%
select(-Matches) %>%
pivot_wider(names_from = team, values_from = team,
values_fn = length, values_fill = 0)
-output
# A tibble: 2 x 5
Date Dummy_England Dummy_Belgium Dummy_Germany Dummy_France
<dbl> <int> <int> <int> <int>
1 16.0 1 1 1 1
2 17.0 0 1 1 0
If we want to keep the 'Date' where there are no matches, use complete
df1 %>%
mutate(team = str_c('Dummy_', str_extract(Matches,
regex('England|Belgium|Germany|France', ignore_case = TRUE)))) %>%
filter(complete.cases(team)) %>%
select(-Matches) %>%
pivot_wider(names_from = team, values_from = team,
values_fn = length, values_fill = 0) %>%
complete(Date = unique(df1$Date), fill = list(Dummy_England = 0,
Dummy_Belgium = 0, Dummy_Germany = 0, Dummy_France = 0))
-output
# A tibble: 3 x 5
Date Dummy_England Dummy_Belgium Dummy_Germany Dummy_France
<dbl> <dbl> <dbl> <dbl> <dbl>
1 16.0 1 1 1 1
2 17.0 0 1 1 0
3 18.0 0 0 0 0
df1 <- structure(list(Date = c(16.03, 16.03, 16.03, 16.03, 16.03, 17.03,
17.03, 17.03, 17.03, 18.03, 18.03, 18.03), Matches = c("England X Brazil",
"Belgium X Argentina", "Chile X Japan", "Uruguay X Germany",
"Italy x France", "South Korea X India", "Germany X France",
"Poland X Belgium", "Colombia X Russia", "South Africa X Mexico",
"China X Japon", "Brazil X Venezuela")), class = "data.frame", row.names = c(NA,
-12L))
Upvotes: 2