Nishant
Nishant

Reputation: 1121

how to get jpg images width and height of all jpg files in a folder

I have a folder poster_folder containing jpg files say for example 1.jpg,2.jpg, 3.jpg

path to this folder is:

from pathlib import Path
from PIL import Image

images_dir = Path('C:\\Users\\HP\\Desktop\\PGDinML_AI_IIITB\\MS_LJMU\\Dissertation topics\\Project_2_Classification of Genre for Movies using Machine Leaning and Deep Learning\\Final_movieScraping_data_textclasification\\posters_final').expanduser()

I have a data frame with jpg image info as:

df_subset_cleaned_poster.head(3)

movie_name  movie_image

Lion_king   1.jpg
avengers    2.jpg
iron_man    3.jpg

I am trying to plot a scatter of width and height of all jpg files (as they are of different resolution) in the folder as below :

height, width = np.empty(len(df_subset_cleaned_poster)), np.empty(len(df_subset_cleaned_poster))

for i in range(len(df_subset_cleaned_poster.movie_image)):
    w, h = Image.open(images_dir.joinpath(df_subset_cleaned_poster['movie_image'][i])).size
    width[i], height[i] = w, h
plt.scatter(width, height, alpha=0.5)
plt.xlabel('Width'); plt.ylabel('Height'); plt.show() 

This is throwing error: KeyError: 208

enter image description here

df_subset_cleaned_poster.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10225 entries, 0 to 10986
Data columns (total 2 columns):
movie_name                  10225 non-null object
movie_image                 10225 non-null object
dtypes: object(2)

Upvotes: 0

Views: 193

Answers (1)

Lydia van Dyke
Lydia van Dyke

Reputation: 2516

As discussed in the comments: The issue seems to be in the creating of the dataframe or in the the csv file itself.

I was able to create a proper scatter plot with the following code:


from pathlib import Path

import numpy as np
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt
from io import StringIO

if __name__ == '__main__':
    images_dir = Path("../data/images")

    infile = StringIO("""movie_name,movie_image
Lion_king,1.jpg
avengers,2.jpg
iron_man,3.jpg
""")

    df_subset_cleaned_poster = pd.read_csv(infile)

    n = len(df_subset_cleaned_poster)
    height, width = np.empty(n), np.empty(n)

    for i, filename in enumerate(df_subset_cleaned_poster.movie_image):
        w, h = Image.open(images_dir / filename).size
        width[i], height[i] = w, h

    plt.scatter(width, height, alpha=0.5)
    plt.xlabel('Width')
    plt.ylabel('Height')
    plt.show()

I suggest that you use this code as the starting point for further experiments. I am using enumerate to iterate over all rows in df_subset_cleaned_poster.movie_image. This should be more robust against IndexErrors on its own.

As you can see, I replaced the infile with a mock string to StringIO. Just replace it with infile = open("your_file.txt") to use the real data again.

Upvotes: 1

Related Questions