R replace value with the value shown by an index

Question

I have a table called "merged" like:

  Nationality CustomerID_count ClusterId
1   argentina          1         1
2   ARGENTINA         26         1
3   ARGENTINO          1         1
4   argentona          1         1
5   boliviana         14         2
6    paragauy          1         3
7    paraguay          1         3
8    PARAGUAY          1         3

I need to create a new Nationality column, searching the max value of Customer_ID_count within each cluster.

I did this other table with the following code:

merged1<-data.table(merged)
merged2<-merged1[, which.max(CustomerID), by = ClusterId]

And I got:

ClusterId V1
1:         1  2
2:         2  1
3:         3  1

After that I did a merge:

tot<-merge(x=merged, y=merged2, by= "ClusterId", all.x=TRUE)

And I got the following table:

 ClusterId Nationality CustomerID V1
1         1   argentina          1  2
2         1   ARGENTINA         26  2
3         1   ARGENTINO          1  2
4         1   argentona          1  2
5         2   boliviana         14  1
6         3    paragauy          1  1
7         3    paraguay          1  1
8         3    PARAGUAY          1  1

But I didn't know how to finish. I tried this:

 tot[,5]=tot[V1,5]

Because I want to have for each row the Nationality that is in the line shown in column V1. This didn't work.

How can I do the last part? and also is there a better way to solve this?

Thanks!

Z&#233; Loff · Accepted Answer

Note that you may have more that one CustomerID_count that matches the maximum value (e.g. all versions of "paraguay" have CustomerID_count == 1, which is the max for that cluster).

It's very easy using the plyr package:

library(plyr)
ddply(merged, .(ClusterId), mutate, Nationality2 = Nationality[CustomerID_count == max(CustomerID_count)])

R replace value with the value shown by an index

Answers (2)

Related Questions