Reputation: 724
I have two dataframes below:
dput output df1:
structure(list(Location = c("1100 2ND AVENUE", "1100 2ND AVENUE",
"1100 2ND AVENUE", "1100 2ND AVENUE", "1100 2ND AVENUE", "1100 2ND AVENUE"
), `Ivend Name` = c("3 Mskt 1.92oz", "Almond Joy 1.61oz", "Aquafina 20oz",
"BCanyonChptleAdzuk1.5oz", "BlkForest FrtSnk 2.25oz", "BluDimndSmkhseAlmd1.5oz"
), `Category Name` = c("Candy", "Candy", "Water", "Salty Snacks",
"Candy", "Nuts/Trailmix"), Calories = c(240, 220, 0, 215, 193,
260), Sugars = c("36", "20", "0", "2", "32", "2"), Month = structure(c(4L,
4L, 4L, 4L, 4L, 4L), .Label = c("Oct", "Nov", "Dec", "Jan", "Feb",
"Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep"), class = "factor"),
Products_available_per_machine = c(0, 0, 0, 0, 0, 0), Units_sold = c(0,
0, 0, 0, 0, 0), Total_Sales = c(0, 0, 0, 0, 0, 0), Spoils = c(0,
0, 0, 0, 0, 0), Building = c("1100 2ND", "1100 2ND", "1100 2ND",
"1100 2ND", "1100 2ND", "1100 2ND"), Item = structure(c(2L,
2L, 1L, 2L, 2L, 2L), .Label = c("Beverage", "Food"), class = "factor"),
Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2019", class = "factor")), row.names = c(NA,
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x00000233561b1ef0>)
dput output df2:
structure(list(`Date Ran` = structure(c(1548892800, 1551312000,
1553817600, 1556582400, 1561680000, 1564531200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Year = c(2019, 2019, 2019, 2019, 2019,
2019), Month = c("January", "February", "March", "April", "June",
"July"), Location = c("SEA18", "SEA18", "SEA18", "SEA18", "SEA18",
"SEA18"), Building = c("Alexandria", "Alexandria", "Alexandria",
"Alexandria", "Alexandria", "Alexandria"), Population = c(1177,
1179, 1178, 1156, 1163, 1163)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I want to pull the pop col from DF 2 and add it to Dataframe 1 based on 'Building' and 'Month' in the order population is filled in DF2.
I tried this command using merge but the col is NULL when I execute:
df_2019_final1$Population <- df_2019_pop$Population[match(df_2019_final1$Month, df_2019_pop$Month, df_2019_final1$Building, df_2019_pop$Building)]
subset_df_pop <- df_2019_pop[, c("Month", "Building", "Population")]
updated_2019_test <- merge(df_2019_final1, subset_df_pop, by = c('Month', 'Building'))
Both produce NULLS and a blank DF
Any help would be greatly appreciated.
Upvotes: 1
Views: 69
Reputation: 887971
In one of the datasets, the 'Month' is abbreviated and in the second it is full name. We can adjust to either one of those formats and the merge
would work
df2$MonthN <- month.abb[match(df2$Month, month.name)]
library(dplyr)
left_join(df1, df2[, c("MonthN", "Building", "Population")],
by = c('Month' = 'MonthN', 'Building'))
Or with merge
merge(df1, df2[, c("MonthN", "Building", "Population")],
by.x = c('Month', 'Building'), by.y = c('MonthN', 'Building'), all.x = TRUE)
Note: The Population column on the merged dataset will be NA
based on the example as "Building" values are different in the subset datasets
Upvotes: 1