tds
tds

Reputation: 21

Using Python, can one convert a .tiff file to a .gpkg file, such that all raster cells are indiscriminatorily kept?

I would like to convert a .tiff file into a .gpkg file, such that every raster cell is accounted for.

When I convert my .tiff to an image array with numpy, I get a value of 350403 returned for the length.

When I attempt the same using gpd.GeoDataFrame.from_features(img_src), I get a value of 343003.

Is there a way that I can ensure that the GeoDataFrame is going to keep all the cells/pixels?

Here is my code, I apologize that it is not the best.

def tiff_to_csv(tiff_file: str, csv_file: str):

'''

Description:

    - This function converts a TIFF file to a CSV file.

Parameters:

        tiff_file:
            - the path to the TIFF file to be converted

        csv_file:
            - the path to save the CSV file

Returns:

            None

'''



# Open the TIFF file
tif_image = Image.open(tiff_file)

# Convert my image to a numpy array
image_array = np.array(tif_image)

# Flatten the array to convert it to a 1D vector
vector_data = image_array.flatten()

print(len(vector_data))

# Save the pixel values to a CSV file
np.savetxt(csv_file, vector_data, delimiter=",")

with rasterio.open(tiff_file) as src:
    
    band1 = src.read(1)
    
    no_data = src.nodata
    print(f'no_data: {no_data}')

    # Generate shapes  (polygons)  from the raster values,
    results = (
        {'properties': {'raster_val': v}, 'geometry': shape(s)}
        for i, (s, v) in enumerate(shapes(band1, transform=src.transform))
    )

    # Convert my shapes to a GeoDataFrame
    gdf = gpd.GeoDataFrame.from_features(band1)
    total_shapes = len(gdf)
    print(f"Number of shapes (including nodata): {total_shapes}")

Interestingly, there are no 'no_data' values, but there is definitely a mismatch in length outputs.

Upvotes: 0

Views: 179

Answers (1)

Bera
Bera

Reputation: 1949

When you vectorize with rasterio, adjacent pixels with the same values get dissolved into one polygon, that's why you get fewer polygons than raster pixels.

You can add a random float between 0 and 1 to each value in the raster array, to make each pixel value unique (hopefully, unless you are very unlucky), then vectorize into a Geopandas dataframe, and floor the values back to the original values:

import rasterio
from rasterio.features import shapes
import geopandas as gpd
import numpy as np

raster_file = r"C:\Users\bera\Desktop\gistest\random_raster.tif"

#Vectorize the raster into a Geopandas dataframe
with rasterio.open(raster_file) as src:
    array = src.read(1)
    print(array.shape)
    #(26, 33). There are 858 pixels in my raster
    results = (
      {'properties': {'raster_val': v}, 'geometry': s}
      for i, (s, v) in enumerate(shapes(source=array, mask=None, transform=src.transform)))
    geoms  = list(results)
    df = gpd.GeoDataFrame.from_features(geoms)

print(df.shape)
#(679, 2)
#So 26*33 - 679 = 179 cells are missing
df.to_file(r"C:\gistest\vectorized.gpkg")

#Add a random float between 0-1 to each array vaklue
random_floats = np.random.rand(*array.shape)
array_with_random_decimals = (array + random_floats).astype("float32")

#Vectorize
results = (
      {'properties': {'raster_val': v}, 'geometry': s}
      for i, (s, v) in enumerate(shapes(source=array_with_random_decimals, mask=None, transform=src.transform)))
geoms  = list(results)
df2 = gpd.GeoDataFrame.from_features(geoms)

df2.shape[0]
#858. Which equals the number of values in the input raster.
#Floor the values back to the original values
df2["raster_val"] = np.floor(df2["raster_val"])
df2.to_file(r"C:\gistest\vectorized_with_random_addition.gpkg")

enter image description here

Upvotes: 0

Related Questions