JM88
JM88

Reputation: 477

Changing the values of a column for the values from another column

I have two datasets that look like this:

https://i.sstatic.net/ROUiF.png

What I want is to change the values from the second column in the first dataset to the values from the second column from the second dataset. All the names in the first dataset are in the second one, and obviously my dataset is much bigger than that.

I was trying to use R to do that but I am very new at it. I was looking at the intersect command but I am not sure if it's going to work. I don't put any codes because I'm real lost here.

I also need that the order of the first columns (which are names) in the first dataset stays the same, but with the new values from the second column of the second dataset.

Upvotes: 0

Views: 132

Answers (3)

Dev Patel
Dev Patel

Reputation: 292

Agree with @agstudy, a simple use of merge would do the trick. Try something like this:

df1 <- data.frame(name=c("ab23242", "ab35366", "ab47490", "ab59614"), 
              X=c(72722, 88283, 99999, 114278.333))

df2 <- data.frame(name=c("ab35366", "ab47490", "ab59614", "ab23242" ), 
                  X=c(12345, 23456, 34567, 456789))
df.merge <- merge(df1, df2, by="name", all.x=T)
df.merge <- df.merge[, -2]

Output:

     name    X.y
1 ab23242 456789
2 ab35366  12345
3 ab47490  23456
4 ab59614  34567

I think merge will keep order of first frame but you can also keep the order strictly by simply adding a column with order df1$order <- 1:nrow(df1) and later on sorting based on that column.

Upvotes: 1

jlhoward
jlhoward

Reputation: 59335

How about this:

library(data.table)
#  generate some random data
dt.1 <- data.table(id = 1:1000, value=rnorm(1000), key="id")
dt.2 <- data.table(id = 2*(500:1), value=as.numeric(1:500), key="id")

# objective is to replace value in df.1 with value from df.2 where id's match.
# data table joins - very efficient
# dt.1 now has 3 columns: id, value, and value.1 from dt.2$value
dt.1 <-dt.2[dt.1,nomatch=NA]
dt.1[is.na(value),]$value=dt.1[is.na(value),]$value.1
dt.1$value.1=NULL      # get rid of extra column

NB: This sorts dt.1 by id which should be OK since it's sorted that way already.

Also: In future, please include data that can be imported into R. Images are not useful!

Upvotes: 0

IRTFM
IRTFM

Reputation: 263301

df1<- data.frame( name1 = letters[6:10], valuecol1=seq(2,10,by=2))
df2 <- data.frame( name2 = letters[1:10], valuecol2=10:1)
df2 [ match(df1$name1, df2$name2) , "valuecol2"] <- df1[ df1$name1 %in% df2$name2 , "valuecol1"]
df2

   name2 valuecol2
1      a        10
2      b         9
3      c         8
4      d         7
5      e         6
6      f         2
7      g         4
8      h         6
9      i         8
10     j        10

This is what I thought might work, but doing replacements using indexing with match sometimes bites me in ways I need to adjust:

 df2 [match(df1$name1, df2$name2) , "valuecol2"] <- 
                          df1[ match(df1$name1, df2$name2) , "valuecol1"]

Here's how I tested it (edited).

> df2 <- data.frame( name2 = letters[1:10], valuecol2=10:1)
> df1<- data.frame( name1 = letters[1:5], valuecol1=seq(2,10,by=2))
> df2 [ match(df1$name1, df2$name2) , "valuecol2"] <- df1[ match(df1$name1, df2$name2) , "valuecol1"]
> df2
   name2 valuecol2
1      a         2
2      b         4
3      c         6
4      d         8
5      e        10
6      f         5
7      g         4
8      h         3
9      i         2
10     j         1

Yep.... bitten again.

> df1<- data.frame( name1 = letters[6:10], valuecol1=seq(2,10,by=2))
> df2 [ match(df1$name1, df2$name2) , "valuecol2"] <- df1[ match(df1$name1, df2$name2) , "valuecol1"]
> df2
   name2 valuecol2
1      a         2
2      b         4
3      c         6
4      d         8
5      e        10
6      f        NA
7      g        NA
8      h        NA
9      i        NA
10     j        NA

Upvotes: 0

Related Questions