Reputation: 21
I would like to convert a .tiff file into a .gpkg file, such that every raster cell is accounted for.
When I convert my .tiff to an image array with numpy, I get a value of 350403 returned for the length.
When I attempt the same using gpd.GeoDataFrame.from_features(img_src), I get a value of 343003.
Is there a way that I can ensure that the GeoDataFrame is going to keep all the cells/pixels?
Here is my code, I apologize that it is not the best.
def tiff_to_csv(tiff_file: str, csv_file: str):
'''
Description:
- This function converts a TIFF file to a CSV file.
Parameters:
tiff_file:
- the path to the TIFF file to be converted
csv_file:
- the path to save the CSV file
Returns:
None
'''
# Open the TIFF file
tif_image = Image.open(tiff_file)
# Convert my image to a numpy array
image_array = np.array(tif_image)
# Flatten the array to convert it to a 1D vector
vector_data = image_array.flatten()
print(len(vector_data))
# Save the pixel values to a CSV file
np.savetxt(csv_file, vector_data, delimiter=",")
with rasterio.open(tiff_file) as src:
band1 = src.read(1)
no_data = src.nodata
print(f'no_data: {no_data}')
# Generate shapes (polygons) from the raster values,
results = (
{'properties': {'raster_val': v}, 'geometry': shape(s)}
for i, (s, v) in enumerate(shapes(band1, transform=src.transform))
)
# Convert my shapes to a GeoDataFrame
gdf = gpd.GeoDataFrame.from_features(band1)
total_shapes = len(gdf)
print(f"Number of shapes (including nodata): {total_shapes}")
Interestingly, there are no 'no_data' values, but there is definitely a mismatch in length outputs.
Upvotes: 0
Views: 179
Reputation: 1949
When you vectorize with rasterio, adjacent pixels with the same values get dissolved into one polygon, that's why you get fewer polygons than raster pixels.
You can add a random float between 0 and 1 to each value in the raster array, to make each pixel value unique (hopefully, unless you are very unlucky), then vectorize into a Geopandas dataframe, and floor the values back to the original values:
import rasterio
from rasterio.features import shapes
import geopandas as gpd
import numpy as np
raster_file = r"C:\Users\bera\Desktop\gistest\random_raster.tif"
#Vectorize the raster into a Geopandas dataframe
with rasterio.open(raster_file) as src:
array = src.read(1)
print(array.shape)
#(26, 33). There are 858 pixels in my raster
results = (
{'properties': {'raster_val': v}, 'geometry': s}
for i, (s, v) in enumerate(shapes(source=array, mask=None, transform=src.transform)))
geoms = list(results)
df = gpd.GeoDataFrame.from_features(geoms)
print(df.shape)
#(679, 2)
#So 26*33 - 679 = 179 cells are missing
df.to_file(r"C:\gistest\vectorized.gpkg")
#Add a random float between 0-1 to each array vaklue
random_floats = np.random.rand(*array.shape)
array_with_random_decimals = (array + random_floats).astype("float32")
#Vectorize
results = (
{'properties': {'raster_val': v}, 'geometry': s}
for i, (s, v) in enumerate(shapes(source=array_with_random_decimals, mask=None, transform=src.transform)))
geoms = list(results)
df2 = gpd.GeoDataFrame.from_features(geoms)
df2.shape[0]
#858. Which equals the number of values in the input raster.
#Floor the values back to the original values
df2["raster_val"] = np.floor(df2["raster_val"])
df2.to_file(r"C:\gistest\vectorized_with_random_addition.gpkg")
Upvotes: 0