GabyLP
GabyLP

Reputation: 3781

R replace value with the value shown by an index

I have a table called "merged" like:

  Nationality CustomerID_count ClusterId
1   argentina          1         1
2   ARGENTINA         26         1
3   ARGENTINO          1         1
4   argentona          1         1
5   boliviana         14         2
6    paragauy          1         3
7    paraguay          1         3
8    PARAGUAY          1         3

I need to create a new Nationality column, searching the max value of Customer_ID_count within each cluster.

I did this other table with the following code:

merged1<-data.table(merged)
merged2<-merged1[, which.max(CustomerID), by = ClusterId]

And I got:

ClusterId V1
1:         1  2
2:         2  1
3:         3  1

After that I did a merge:

tot<-merge(x=merged, y=merged2, by= "ClusterId", all.x=TRUE)

And I got the following table:

 ClusterId Nationality CustomerID V1
1         1   argentina          1  2
2         1   ARGENTINA         26  2
3         1   ARGENTINO          1  2
4         1   argentona          1  2
5         2   boliviana         14  1
6         3    paragauy          1  1
7         3    paraguay          1  1
8         3    PARAGUAY          1  1

But I didn't know how to finish. I tried this:

 tot[,5]=tot[V1,5]

Because I want to have for each row the Nationality that is in the line shown in column V1. This didn't work.

How can I do the last part? and also is there a better way to solve this?

Thanks!

Upvotes: 0

Views: 47

Answers (2)

Z&#233; Loff
Z&#233; Loff

Reputation: 1712

Note that you may have more that one CustomerID_count that matches the maximum value (e.g. all versions of "paraguay" have CustomerID_count == 1, which is the max for that cluster).

It's very easy using the plyr package:

library(plyr)
ddply(merged, .(ClusterId), mutate, Nationality2 = Nationality[CustomerID_count == max(CustomerID_count)])

Upvotes: 1

hrbrmstr
hrbrmstr

Reputation: 78832

This could be a good use-case for `dplyr:

library(dplyr)


merged <- merged %>% 
          group_by(ClusterId) %>%
          mutate(newNat=Nationality[CustomerID_count == max(CustomerID_count)]) %>%
          ungroup

print(merged)

##     Source: local data frame [8 x 4]
## 
##   Nationality CustomerID_count ClusterId    newNat
## 1   argentina                1         1 ARGENTINA
## 2   ARGENTINA               26         1 ARGENTINA
## 3   ARGENTINO                1         1 ARGENTINA
## 4   argentona                1         1 ARGENTINA
## 5   boliviana               14         2 boliviana
## 6    paragauy                1         3  paragauy
## 7    paraguay                1         3  paraguay
## 8    PARAGUAY                1         3  PARAGUAY

Upvotes: 1

Related Questions