Reputation: 477
I have two datasets that look like this:
What I want is to change the values from the second column in the first dataset to the values from the second column from the second dataset. All the names in the first dataset are in the second one, and obviously my dataset is much bigger than that.
I was trying to use R to do that but I am very new at it. I was looking at the intersect command but I am not sure if it's going to work. I don't put any codes because I'm real lost here.
I also need that the order of the first columns (which are names) in the first dataset stays the same, but with the new values from the second column of the second dataset.
Upvotes: 0
Views: 132
Reputation: 292
Agree with @agstudy, a simple use of merge would do the trick. Try something like this:
df1 <- data.frame(name=c("ab23242", "ab35366", "ab47490", "ab59614"),
X=c(72722, 88283, 99999, 114278.333))
df2 <- data.frame(name=c("ab35366", "ab47490", "ab59614", "ab23242" ),
X=c(12345, 23456, 34567, 456789))
df.merge <- merge(df1, df2, by="name", all.x=T)
df.merge <- df.merge[, -2]
Output:
name X.y
1 ab23242 456789
2 ab35366 12345
3 ab47490 23456
4 ab59614 34567
I think merge will keep order of first frame but you can also keep the order strictly by simply adding a column with order df1$order <- 1:nrow(df1)
and later on sorting based on that column.
Upvotes: 1
Reputation: 59335
How about this:
library(data.table)
# generate some random data
dt.1 <- data.table(id = 1:1000, value=rnorm(1000), key="id")
dt.2 <- data.table(id = 2*(500:1), value=as.numeric(1:500), key="id")
# objective is to replace value in df.1 with value from df.2 where id's match.
# data table joins - very efficient
# dt.1 now has 3 columns: id, value, and value.1 from dt.2$value
dt.1 <-dt.2[dt.1,nomatch=NA]
dt.1[is.na(value),]$value=dt.1[is.na(value),]$value.1
dt.1$value.1=NULL # get rid of extra column
NB: This sorts dt.1
by id
which should be OK since it's sorted that way already.
Also: In future, please include data that can be imported into R. Images are not useful!
Upvotes: 0
Reputation: 263301
df1<- data.frame( name1 = letters[6:10], valuecol1=seq(2,10,by=2))
df2 <- data.frame( name2 = letters[1:10], valuecol2=10:1)
df2 [ match(df1$name1, df2$name2) , "valuecol2"] <- df1[ df1$name1 %in% df2$name2 , "valuecol1"]
df2
name2 valuecol2
1 a 10
2 b 9
3 c 8
4 d 7
5 e 6
6 f 2
7 g 4
8 h 6
9 i 8
10 j 10
This is what I thought might work, but doing replacements using indexing with match
sometimes bites me in ways I need to adjust:
df2 [match(df1$name1, df2$name2) , "valuecol2"] <-
df1[ match(df1$name1, df2$name2) , "valuecol1"]
Here's how I tested it (edited).
> df2 <- data.frame( name2 = letters[1:10], valuecol2=10:1)
> df1<- data.frame( name1 = letters[1:5], valuecol1=seq(2,10,by=2))
> df2 [ match(df1$name1, df2$name2) , "valuecol2"] <- df1[ match(df1$name1, df2$name2) , "valuecol1"]
> df2
name2 valuecol2
1 a 2
2 b 4
3 c 6
4 d 8
5 e 10
6 f 5
7 g 4
8 h 3
9 i 2
10 j 1
Yep.... bitten again.
> df1<- data.frame( name1 = letters[6:10], valuecol1=seq(2,10,by=2))
> df2 [ match(df1$name1, df2$name2) , "valuecol2"] <- df1[ match(df1$name1, df2$name2) , "valuecol1"]
> df2
name2 valuecol2
1 a 2
2 b 4
3 c 6
4 d 8
5 e 10
6 f NA
7 g NA
8 h NA
9 i NA
10 j NA
Upvotes: 0