viatmincode
viatmincode

Reputation: 35

How can I add specific data from one dataframe to another larger dataframe?

I have two dataframes; df1 is for baseball games and df2 is weather (rain) data by stadiums.

df1
date         Stadium
2019-01-01   Yankee
2019-01-01   Dodger
2019-01-01   Coors
2019-01-02   Dodger
...
df2
date             Yankee     Dodger     Coors
2019-01-01       0          0.1        0.1
2019-01-02       0          0          0
2019-01-03       0          0.2        0.1
...

I would like to add a df1$rain column in df1, using appropriate data from df2. I have tried merging the two data frames, but it returns redundant and long data with too many columns.

Upvotes: 2

Views: 58

Answers (2)

M--
M--

Reputation: 28850

Using data.table:

library(data.table)

melt(setDT(df2), id.vars = 'date', 
                 variable.name = "Stadium", 
                 value.name = "rain")[setDT(df1), on = .(date, Stadium)]

Another tidyverse solution, using tidyr::gather:

library(dplyr)
library(tidyr)

df2 %>% 
  gather(Stadium, rain, -date) %>% 
  right_join(df1)

Data:

df1 <- read.table(text="
date         Stadium
2019-01-01   Yankee
2019-01-01   Dodger
2019-01-01   Coors
2019-01-02   Dodger", 
                  header=T, stringsAsFactors=F)

df2 <- read.table(text="
date             Yankee     Dodger     Coors
2019-01-01       0          0.1        0.1
2019-01-02       0          0          0
2019-01-03       0          0.2        0.1", 
                  header=T, stringsAsFactors=F)

Upvotes: 2

akrun
akrun

Reputation: 887118

One option is match from base R to get the row/column index, use that to extract the values from the second dataset to create the column 'rain' in first dataset

df1$rain <-  df2[-1][cbind(match(df1$date, df2$date), 
                           match(df1$Stadium, names(df2)[-1]))]

Or another option is to convert the second dataset into 'long' format and then do a join

library(dplyr)
library(tidyr)
df2 %>% 
   pivot_longer(cols = -date, names_to = 'Stadium', values_to = 'rain') %>%
   right_join(df1)
# A tibble: 4 x 3
#  date       Stadium  rain
#  <chr>      <chr>   <dbl>
#1 2019-01-01 Yankee    0  
#2 2019-01-01 Dodger    0.1
#3 2019-01-01 Coors     0.1
#4 2019-01-02 Dodger    0  

data

df1 <- structure(list(date = c("2019-01-01", "2019-01-01", "2019-01-01", 
"2019-01-02"), Stadium = c("Yankee", "Dodger", "Coors", "Dodger"
)), class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(date = c("2019-01-01", "2019-01-02", "2019-01-03"
), Yankee = c(0L, 0L, 0L), Dodger = c(0.1, 0, 0.2), Coors = c(0.1, 
0, 0.1)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 2

Related Questions