Wolfy
Wolfy

Reputation: 458

Attaching parameters to geojson object becomes non-existant when creating a geopandas dataframe

I have this dataframe

d = {
    'geoid': ['13085970205'],
    'FIPS': ['13085'],
    'Year': [2024],
    'parameters': [{"Year": 2024, "hpi_prediction": 304.32205}],
    'geometry':[
        {
            "coordinates": [[[[-84.126456, 34.389734], [-84.12641, 34.39026], [-84.126323, 34.39068]]]],
            "parameters": {"Year": 2024, "hpi_prediction": 304.32205},
            "type": "MultiPolygon"
        }
    ]
    
}

dd = pd.DataFrame(data=d)

When I want to write this out I use import geopandas as gpd to convert the data into a dataframe like this

df_geopandas_hpi = gpd.GeoDataFrame(dd[['geoid', 'geometry']])

Once this happens the parameters key in the original dataframe gets erased. Why? Note that the type of geometry in example dataframe is geojson.geometry.MultiPolygon. How can I avoid this from happening?

What I essentially need to do is the following

if ~os.path.exists('../verus_data'):
    os.mkdir('../verus_data')

for county, df_county in dd.groupby('FIPS'):
    if ~os.path.exists('../verus_data/'+str(county)):
        os.mkdir('../verus_data/'+str(county))

    if ~os.path.exists('../verus_data/'+str(county)+'/'+'predicted'):
        os.mkdir('../verus_data/'+str(county)+'/'+'predicted')

    if ~os.path.exists('../verus_data/'+str(county)+'/'+'analyzed'):
        os.mkdir('../verus_data/'+str(county)+'/'+'analyzed')    

    df_hpi = df_county[df_county['key'] == 'hpi']
    df_analyzed = df_county[df_county['key'] == 'analyzed']

    for year, df_year in df_hpi.groupby('Year'):
        if ~os.path.exists('../verus_data/'+str(county)+'/'+'predicted'+'/'+str(year)):
            os.mkdir('../verus_data/'+str(county)+'/'+'predicted'+'/'+str(year))

            df_geopandas_hpi = gpd.GeoDataFrame(df_year[['geoid', 'geometry', 'parameters']])
            df_geopandas_hpi.to_file('../verus_data/'+str(county)+'/'+'predicted'+'/'+str(year)+'/'+'hpi_predictions.geojson', driver="GeoJSON")

    for year, df_year in df_analyzed.groupby('Year'):
        if ~os.path.exists('../verus_data/'+str(county)+'/'+'analyzed'+'/'+str(year)):
            os.mkdir('../verus_data/'+str(county)+'/'+'analyzed'+'/'+str(year))

            df_geopandas_analyzed = gpd.GeoDataFrame(df_year[['geoid', 'geometry', 'parameters']])
            df_geopandas_analyzed.to_file('../verus_data/'+str(county)+'/'+'analyzed'+'/'+str(year)+'/'+'analyzed_values.geojson', driver="GeoJSON")

I need to somehow write out these geojson files while keeping parameters key intact.

Upvotes: 0

Views: 54

Answers (2)

Michael Delgado
Michael Delgado

Reputation: 15452

Geopandas relies on the shapely library to handle geometry objects. Shapely does not have a concept of parameters or additional metadata which can be included at arbitrary levels in GeoJSON but don't fit the shapely or geopandas data models.

For example, when parsing with shapely.geometry.shape:

In [10]: shape = shapely.geometry.shape(
    ...:         {
    ...:             "coordinates": [[[[-84.126456, 34.389734], [-84.12641, 34.39026], [-84.126323, 34.39068]]]],
    ...:             "parameters": {"Year": 2024, "hpi_prediction": 304.32205},
    ...:             "type": "MultiPolygon"
    ...:         }
    ...:     )

In [11]: shape
Out[11]: <shapely.geometry.multipolygon.MultiPolygon at 0x11040eb60>

In [12]: shape.parameters
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 shape.parameters

AttributeError: 'MultiPolygon' object has no attribute 'parameters'

If you'd like to retain these, you'll need to parse the json separately from converting to geopandas. For example, if "parameters" is present in every element, you could simply assign it as a new column:


In [21]: gdf = gpd.GeoDataFrame(dd[["geoid", "geometry"]])
    ...: gdf["parameters"] = dd.geometry.str["parameters"]

In [22]: gdf
Out[22]:
         geoid                                           geometry                                   parameters
0  13085970205  {'coordinates': [[[[-84.126456, 34.389734], [-...  {'Year': 2024, 'hpi_prediction': 304.32205}

However, if the parameters field is not always present, you may need to do some extra cleaning. You can always access the elements of the geometry column within the pandas dataframe dd directly, e.g.

In [27]: dd.loc[0, "geometry"]["parameters"]["hpi_prediction"]
Out[27]: 304.32205

Upvotes: 1

Wolfy
Wolfy

Reputation: 458

All you have to do is add the parameters column in the

df_geopandas_hpi = gpd.GeoDataFrame(df_year[['geoid', 'geometry', 'parameters']])

Upvotes: 0

Related Questions