mlangenfeld
mlangenfeld

Reputation: 21

Calculating the total Area of a GeoDataFrame in Python

I have a GeoDataFrame with the geometry of South America. I wanted to calculate the area and did this the following way:

south_america.geometry.to_crs(epsg=3035)
south_america.loc[:, "AREA"] = south_america.geometry.area / 10**6
totalArea = south_america.AREA.sum()

The result should be 17759005.81506123, but is 0.001547957692761746.

I know that the correct version is:

totalArea = sum(south_america.geometry.to_crs(epsg=3035).area) / 10**6

but I don't understand where the difference is. Can you please explain it to me?

Upvotes: 1

Views: 708

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31146

  • to calculate areas in hectares it's necessary to use UTM CRS
  • there are multiple UTM CRS zones in South America. Using one will reduce accuracy
  • have calculated country areas using a single UTM CRS and a UTM CRS by country. Later gets closer to result you want
  • I believe some countries will be covered by multiple UTM CRS zones, so even by country there is potential for some error. Also depends on accuracy of geometry, how used lowres so this will also have some impact on accuracy
  • results show both one UTM CRS zone for all countries and per country UTM CRS zone provide a result reasonably close to your expected answer

results comparison

1.0345126720577646 1.014198080585325
import geopandas as gpd

gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))

# filter to south america and calculate area is hectares based on UTM geometry
south_america = gdf.loc[gdf["continent"].eq("South America")].assign(
    area=lambda d: d.to_crs(d.estimate_utm_crs()).area / 10**6,
    area_2=lambda d: [
        gpd.GeoDataFrame(geometry=[g], crs=gdf.crs).pipe(
            lambda d: d.to_crs(d.estimate_utm_crs()).area.sum() / 10**6
        )
        for g in d["geometry"]
    ],
)

# compare results to expected using global UTM approach and UTM by country
a = 17759005.81506123
print(south_america["area"].sum() / a, south_america["area_2"].sum() / a)

Upvotes: 1

Related Questions