Reputation: 35
I have two dataframes; df1
is for baseball games and df2
is weather (rain) data by stadiums.
df1
date Stadium
2019-01-01 Yankee
2019-01-01 Dodger
2019-01-01 Coors
2019-01-02 Dodger
...
df2
date Yankee Dodger Coors
2019-01-01 0 0.1 0.1
2019-01-02 0 0 0
2019-01-03 0 0.2 0.1
...
I would like to add a df1$rain
column in df1
, using appropriate data from df2
. I have tried merging the two data frames, but it returns redundant and long data with too many columns.
Upvotes: 2
Views: 58
Reputation: 28850
Using data.table
:
library(data.table)
melt(setDT(df2), id.vars = 'date',
variable.name = "Stadium",
value.name = "rain")[setDT(df1), on = .(date, Stadium)]
Another tidyverse
solution, using tidyr::gather
:
library(dplyr)
library(tidyr)
df2 %>%
gather(Stadium, rain, -date) %>%
right_join(df1)
Data:
df1 <- read.table(text="
date Stadium
2019-01-01 Yankee
2019-01-01 Dodger
2019-01-01 Coors
2019-01-02 Dodger",
header=T, stringsAsFactors=F)
df2 <- read.table(text="
date Yankee Dodger Coors
2019-01-01 0 0.1 0.1
2019-01-02 0 0 0
2019-01-03 0 0.2 0.1",
header=T, stringsAsFactors=F)
Upvotes: 2
Reputation: 887118
One option is match
from base R
to get the row/column
index, use that to extract the values from the second dataset to create the column 'rain' in first dataset
df1$rain <- df2[-1][cbind(match(df1$date, df2$date),
match(df1$Stadium, names(df2)[-1]))]
Or another option is to convert the second dataset into 'long' format and then do a join
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(cols = -date, names_to = 'Stadium', values_to = 'rain') %>%
right_join(df1)
# A tibble: 4 x 3
# date Stadium rain
# <chr> <chr> <dbl>
#1 2019-01-01 Yankee 0
#2 2019-01-01 Dodger 0.1
#3 2019-01-01 Coors 0.1
#4 2019-01-02 Dodger 0
df1 <- structure(list(date = c("2019-01-01", "2019-01-01", "2019-01-01",
"2019-01-02"), Stadium = c("Yankee", "Dodger", "Coors", "Dodger"
)), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(date = c("2019-01-01", "2019-01-02", "2019-01-03"
), Yankee = c(0L, 0L, 0L), Dodger = c(0.1, 0, 0.2), Coors = c(0.1,
0, 0.1)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 2