Reputation: 2716
I have a data frame with 2332 rows, I want to find rows where the variable "POSTAL" is equal, then assign them the all values of the row where the variable "area" is the biggest.
here are the first 50 rows
> data[1:50,]
POSTAL x y area
0 12920 573385.9 4972933 8.384062e+06
1 12921 623487.7 4971908 8.233541e+07
2 12923 583786.9 4978081 1.474410e+08
3 12924 613452.4 4927788 1.497106e+07
4 12934 588962.9 4965368 2.194386e+08
5 12935 596550.0 4967100 1.888997e+08
6 12944 618378.6 4921592 2.534854e+07
7 12952 583074.3 4953381 2.943473e+07
8 12955 582523.7 4959810 5.204965e+07
9 12958 611949.9 4979674 9.186815e+07
10 12959 601546.4 4979545 1.037816e+08
11 12962 611088.7 4951280 1.079834e+08
12 12972 612442.2 4934335 2.356099e+08
13 12978 595047.1 4941416 9.280316e+06
14 12979 628230.8 4983172 1.076677e+07
15 12981 591559.5 4944906 3.203060e+08
16 12985 599050.4 4935220 1.643595e+08
17 12992 616585.6 4963995 1.989913e+08
18 12997 592669.1 4914134 2.731502e+07
19 12017 627445.1 4686235 4.773138e+07
20 12024 619994.9 4704246 7.021505e+06
21 12029 629805.8 4696477 5.399608e+07
22 12037 618566.6 4688290 9.184531e+07
23 12060 624089.4 4697165 8.745604e+07
24 12062 622755.7 4709897 8.574364e+06
25 12075 612614.1 4683772 9.799130e+07
26 12106 606331.5 4693118 4.081914e+07
27 12115 615361.6 4702384 3.238215e+06
28 12123 614210.3 4708912 9.383202e+04
29 12123 614210.3 4708912 6.075477e+06
30 12123 614210.3 4708912 6.739686e+03
31 12125 631088.1 4703923 3.758122e+07
32 12130 610476.0 4700356 2.607542e+06
33 12136 618643.1 4698809 5.321862e+07
34 12156 603612.7 4704504 1.373999e+07
35 12156 603612.7 4704504 3.371689e+04
36 12156 603612.7 4704504 1.784716e+04
37 12156 603612.7 4704504 1.493681e+05
38 12156 600920.7 4704250 7.195805e+03
39 12165 623467.2 4685155 8.364310e+06
40 12168 633097.9 4713609 2.418246e+06
41 12173 602210.1 4692849 3.943830e+07
42 12184 610816.1 4697644 1.067326e+08
43 12502 610929.0 4659595 7.862394e+07
44 12503 617592.7 4654358 7.326900e+07
45 12513 606790.9 4673634 9.045891e+06
46 12516 619101.7 4662348 4.084114e+07
47 12517 622938.9 4664008 2.745140e+07
48 12521 611453.2 4669033 8.611940e+07
49 12523 602331.7 4660411 5.620575e+07
here is the imperfect code I have, which crashes my computer
n <- 1:nrow(data)
for (i in seq(along = n)) {
for (j in seq(along = n)){
while (data[i,]$POSTAL == data[j,]$POSTAL) {
if (data[i,]$area < data[j,]$area) {
(temp2[i,]$x <- temp2[j,]$x ) & ( temp2[i,]$y <- temp2[j,]$y)}}}
Upvotes: 3
Views: 77
Reputation: 66819
My guess for what the OP's seeking is the same as @josilber's. Here's a non-base R way:
library(data.table)
setDT(data)[, c("x","y") := {ii = which.max(area) ; list(x[ii], y[ii])}, by = POSTAL]
(For the example given, this only makes one change, on row 39.)
Upvotes: 5
Reputation: 44320
I think you are trying to set all the x and y values in a given POSTAL value to the values where area is the largest. You could accomplish this in base R with split-apply-combine:
do.call(rbind, lapply(split(data, data$POSTAL), function(x) {
x$x <- x$x[which.max(x$area)]
x$y <- x$y[which.max(x$area)]
x
}))
Upvotes: 3