Reputation: 3781
I have a table called "merged" like:
Nationality CustomerID_count ClusterId
1 argentina 1 1
2 ARGENTINA 26 1
3 ARGENTINO 1 1
4 argentona 1 1
5 boliviana 14 2
6 paragauy 1 3
7 paraguay 1 3
8 PARAGUAY 1 3
I need to create a new Nationality column, searching the max value of Customer_ID_count within each cluster.
I did this other table with the following code:
merged1<-data.table(merged)
merged2<-merged1[, which.max(CustomerID), by = ClusterId]
And I got:
ClusterId V1
1: 1 2
2: 2 1
3: 3 1
After that I did a merge:
tot<-merge(x=merged, y=merged2, by= "ClusterId", all.x=TRUE)
And I got the following table:
ClusterId Nationality CustomerID V1
1 1 argentina 1 2
2 1 ARGENTINA 26 2
3 1 ARGENTINO 1 2
4 1 argentona 1 2
5 2 boliviana 14 1
6 3 paragauy 1 1
7 3 paraguay 1 1
8 3 PARAGUAY 1 1
But I didn't know how to finish. I tried this:
tot[,5]=tot[V1,5]
Because I want to have for each row the Nationality that is in the line shown in column V1. This didn't work.
How can I do the last part? and also is there a better way to solve this?
Thanks!
Upvotes: 0
Views: 47
Reputation: 1712
Note that you may have more that one CustomerID_count
that matches the maximum value (e.g. all versions of "paraguay" have CustomerID_count
== 1, which is the max for that cluster).
It's very easy using the plyr
package:
library(plyr)
ddply(merged, .(ClusterId), mutate, Nationality2 = Nationality[CustomerID_count == max(CustomerID_count)])
Upvotes: 1
Reputation: 78832
This could be a good use-case for `dplyr:
library(dplyr)
merged <- merged %>%
group_by(ClusterId) %>%
mutate(newNat=Nationality[CustomerID_count == max(CustomerID_count)]) %>%
ungroup
print(merged)
## Source: local data frame [8 x 4]
##
## Nationality CustomerID_count ClusterId newNat
## 1 argentina 1 1 ARGENTINA
## 2 ARGENTINA 26 1 ARGENTINA
## 3 ARGENTINO 1 1 ARGENTINA
## 4 argentona 1 1 ARGENTINA
## 5 boliviana 14 2 boliviana
## 6 paragauy 1 3 paragauy
## 7 paraguay 1 3 paraguay
## 8 PARAGUAY 1 3 PARAGUAY
Upvotes: 1