Reputation: 149
I have a sf object of points within same polygons. I want to aggregate all info by polygon Id and keep only one point per polygon : summing quantitative values and keeping character values of the max population with a fcode starting with "P".
I don't know how I can do that.
Here, an exemple with a sample of my sf (6 points in one polygon)
#My sf with 6 points in the same polygon (Poly.id =1), so I want to aggregate by Poly.id
str(Paris)
Classes ‘sf’, ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of 15 variables:
$ geonameId : chr "2988507" "9844895" "2988623" "9843563" ...
$ asciiName : chr "Paris" "Paris 01 Ancien - Quartier Champs-Elysees" "Paris 05 Pantheon" "Paris 09 Ancien - Quartier Hotel-de-Ville" ...
$ members_etmun : num 30 0 0 0 0 0
$ members_urbact : num 3 0 0 0 0 0
$ participations_eucicop: num 175 1 9 13 1 6
$ partners_eucicop : num 151 1 7 8 1 4
$ projects_eucicop : num 156 1 8 13 1 6
$ fcodeName : chr "capital of a political entity" "historical populated place" "populated place" "historical populated place" ...
$ fcode : chr "PPLC" "PPLH" "PPL" "PPLH" ...
$ countryCode : chr "FR" "FR" "FR" "FR" ...
$ continentCode : chr "EU" "EU" "EU" "EU" ...
$ population : chr "2138551" "0" "0" "0" ...
$ CityId : int 2 3 4 5 6 7
$ Poly.id : int 1 1 1 1 1 1
$ geometry :sfc_POINT of length 6; first list element: 'XY' num 3760488 2889158
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr "geonameId" "asciiName" "members_etmun" "members_urbact" ...
Here, the output I want (Get all character variables of the first row, because it's the max of "population" and the fcode starts with "P", then sum variables(3:7)). So something like that :
str(Paris)
Classes ‘sf’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 15 variables:
$ geonameId : chr "2988507"
$ asciiName : chr "Paris"
$ members_etmun : num 30
$ members_urbact : num 3
$ participations_eucicop: num 205
$ partners_eucicop : num 172
$ projects_eucicop : num 185
$ fcodeName : chr "capital of a political entity"
$ fcode : chr "PPLC"
$ countryCode : chr "FR"
$ continentCode : chr "EU"
$ population : chr "2138551"
$ CityId : int 2
$ Poly.id : int 1
$ geometry :sfc_POINT of length 1; first list element: 'XY' num 3760488 2889158
...
I've tried several things with dplyr or the function aggregate but I don't know how to implement conditions (sum of some quantitative variables, and conditional selection of qualitative values)
Below : the info for the reprex
Paris <-
structure(
list(
geonameId = c(
"2988507",
"9844895",
"2988623",
"9843563",
"9843589",
"2970479"
),
asciiName = c(
"Paris",
"Paris 01 Ancien - Quartier Champs-Elysees",
"Paris 05 Pantheon",
"Paris 09 Ancien - Quartier Hotel-de-Ville",
"Paris 10 Ancien - Quartier Invalides",
"Paris 15 Vaugirard"
),
members_etmun = c(30, 0, 0, 0, 0, 0),
members_urbact = c(3,
0, 0, 0, 0, 0),
participations_eucicop = c(175, 1, 9, 13,
1, 6),
partners_eucicop = c(151, 1, 7, 8, 1, 4),
projects_eucicop = c(156,
1, 8, 13, 1, 6),
fcodeName = c(
"capital of a political entity",
"historical populated place",
"populated place",
"historical populated place",
"historical populated place",
"populated place"
),
fcode = c("PPLC",
"PPLH", "PPL", "PPLH", "PPLH", "PPL"),
countryCode = c("FR",
"FR", "FR", "FR", "FR", "FR"),
continentCode = c("EU", "EU",
"EU", "EU", "EU", "EU"),
population = c("2138551", "0", "0",
"0", "0", "0"),
CityId = 2:7,
Poly.id = c(1L, 1L, 1L, 1L,
1L, 1L),
geometry = structure(
list(
structure(
c(3760487.79387524,
2889157.59339236),
class = c("XY", "POINT", "sfg")
),
structure(
c(3757309.96074126,
2891227.55260787),
class = c("XY", "POINT", "sfg")
),
structure(
c(3760266.48357105,
2888217.53980758),
class = c("XY", "POINT", "sfg")
),
structure(
c(3760964.29140438,
2889376.36175204),
class = c("XY", "POINT", "sfg")
),
structure(
c(3757255.00710537,
2889909.86397746),
class = c("XY", "POINT", "sfg")
),
structure(
c(3756810.81051837,
2888176.61058656),
class = c("XY", "POINT", "sfg")
)
),
class = c("sfc_POINT",
"sfc"),
precision = 0,
bbox = structure(
c(
xmin = 3756810.81051837,
ymin = 2888176.61058656,
xmax = 3760964.29140438,
ymax = 2891227.55260787
),
class = "bbox"
),
crs = structure(
list(epsg = 3035L, proj4string = "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"),
class = "crs"
),
n_empty = 0L
)
),
row.names = c(NA,
6L),
sf_column = "geometry",
agr = structure(
c(
geonameId = NA_integer_,
asciiName = NA_integer_,
members_etmun = NA_integer_,
members_urbact = NA_integer_,
participations_eucicop = NA_integer_,
partners_eucicop = NA_integer_,
projects_eucicop = NA_integer_,
fcodeName = NA_integer_,
fcode = NA_integer_,
countryCode = NA_integer_,
continentCode = NA_integer_,
population = NA_integer_,
CityId = NA_integer_,
Poly.id = NA_integer_
),
class = "factor",
.Label = c("constant",
"aggregate", "identity")
),
class = c("sf", "tbl_df", "tbl", "data.frame")
)
What kind of function do I have to use in order to do something like this ? Thank you in advance
Upvotes: 0
Views: 209
Reputation: 5336
There is probably an easier way to do this but...
You could use by
to create the row of the dataframe for each polygon id, then rbind
to bind them back together into a single dataframe.
I don't fully understand your 'keeping character values of the max population with an fcode starting with p' but you can adapt the code below if I haven't quite got it right:
## First change the population to a numeric:
Paris$population <- as.numeric(Paris$population)
# This will apply the function in FUN to each subset of d defined by a unique poly.id
polygons <- by(data=Paris, INDICES = list(Paris$Poly.id) , FUN = function(d){
# Identify all the rows with fcode that starts with P
startswithP <- d[substr(d$fcode,1,1)=="P",]
# Now get the row with the highest population from those rows
# There might be more than one of these so just take the first row.
maxpop <- startswithP[startswithP$population == max(startswithP$population),][1,]
# Now construct the row of my dataframe.
# First sum all the numeric rows of d.
# Then identify all the character rows of maxpop
data.frame( as.list(apply(d[sapply(d, is.numeric)],2,sum)),
maxpop[sapply(maxpop, is.character)])
})
# Now I have a list of one-row dataframes I can rbind them together.
Paris.perpolygon <- do.call(rbind, polygons)
Paris.perpolygon
Upvotes: 1