Reputation: 33
So what I have is two different data frames: the one I've been working on (df1) and the one with all the new data I need to put in the first one (df2). Df1 has several columns of zeroes, waiting for the data to be added in. Df2 has the data I need, and several more rows and columns that I don't care about beyond that data. Here is a small subset of the type of data I'm working with.
This is my first time posting my data so I hope I'm doing it right. Let me know if you need a different format.
df1:
structure(list(season = c(" FA15", " FA15", " FA15", " FA15",
" FA15", " FA15", " FA15", " FA15", " FA15", " FA15"), year = c("2015",
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2015",
"2015"), territory.name = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), plot = c("0",
"0", "0", "0", "0", "0", "0", "0", "0", "0"), color.band = c("APGBY",
"APGGU", "APGPW", "APGPW", "APGR", "APGUO", "APGUO", "APGUO",
"APGUO", "APGYR")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
df2:
structure(list(bandnum = c(157328052, 160379101, 157328094, 151313455,
170364680, 160379104, 151373458, 157328066, 160379103, 160379105
), project = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("*ISSJ", "ISSJ"), class = "factor"), color.band = c("PAWR",
"WYWAR", "APGP", "APGO", "ABYG", "URYAR", "APBW", "WABG", "OBWAR",
"GBGAR"), sex = structure(c(3L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 1L,
2L), .Label = c("?", "F", "M"), class = "factor"), age = structure(c(2L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("AHY", "ASY",
"HY", "N", "SY"), class = "factor")), row.names = c(NA, 10L), class = "data.frame")
I've been chewing on this problem for a few days, trying different things and reading so many answers on stack overflow, but I'm failing to come up with a clear answer on how to take data from one dataframe and copy it into existing columns in another dataframe based on the shared ID of a third column.
Pretty much, I want r to see that both data frames have in the color.band column a listing for the band ABCDEF, and then take the value from df2$bandnum in the same row as ABCDEF and copy it to df1$bandnum in the ABCDEF row there.
I don't want to copy rows that are in df2 but not df1 into df1. I want to mark entries that exist in df1 but not df2 as N/A in the bandnum column.
Column names and data format for color band and band number have been standardized between the two data frames so everything should line up. What I have so far with code is this:
> practicedf <- left_join(x=df1, y=df2, by = "color.band", all.x = TRUE)
%>% mutate(y = ifelse(is.na(df1$color.band), df1$bandnum, df1$color.band)) %>% select(df2$bandnum)
left_join seems to be the right one because it keeps all rows in the left (df1) data frame and only matching rows from the right (df2) data frame. I get this error though:
Error in `[[<-.data.frame`(`*tmp*`, col, value = c("APGBY", "APGGU", "APGPW", :
replacement has 1261 rows, data has 2559
color.band is a character vector while bandnum is numerical, is that a problem? What could be the problem here?
Edit: I had an error with having the column bandnum in both dataframes so I changed df2$bandnum to bandnum.y. My code is now
df1_test <- left_join(x=df1, y=df2, by = "color.band") %>% mutate(y =
ifelse(is.na(color.band), bandnum, color.band)) %>% select(bandnum.y)
but when I view(df1_test) it only shows me the column bandnum.y and it's not the same number of entries as my original df1
Here's a subset of df1_test (not the whole thing because it's 2600 entries)
Any way I can make it show the rest of my data as well?
structure(list(bandnum.y = c("171324972", "171324972", "171324972",
"178324697", "178324697", "178324697", "178324697", "178324697",
"178324697", "178324697", "170364505", "170364505", "170364505",
"170364505", "170364505", "170364505", NA, "178324692", "178324692",
"178324692")), row.names = c(NA, -20L), class = c("tbl_df", "tbl",
"data.frame"))
Upvotes: 2
Views: 1162
Reputation: 39
If I don't get you wrong, you want to update an old df (df1) with information from a new df (df2).
In data.table
, you can try this:
libraty(data.table)
setDT(df1)
setDT(df2)
update.vars = intersect(names(df1), names(df2)) # update only common variables
df1[df2, c(update.vars) := df2[,update.vars, with=FALSE], on= 'color.band']
Generally this should work. But in the given data the 'merge' ids (color.band
column) are not unique, which may affect the results.
Upvotes: 1
Reputation: 389325
left_join
does not have all.x = TRUE
that is part of base R merge
.
You could do the following in base R :
df1_test <- transform(merge(df1, df2, by = "color.band", all.x = TRUE),
y = ifelse(is.na(color.band), bandnum, color.band))
Upvotes: 1
Reputation: 887951
We cannot use the original dataset 'df1' columns after the join becuase it is a left_join
. In tidyverse
, we specify the unquoted column names. There is no all.x
argument in left_join
. It should be from merge
library(dplyr)
left_join(x=df1, y=df2, by = "color.band") %>%
mutate(y = ifelse(is.na(color.band), bandnum, color.band))
Upvotes: 1