Vivek Jha
Vivek Jha

Reputation: 257

Pythonic way to loop over dictionary

I am practicing Pandas and have the following task:

Create a list whose elements are the # of columns of each .csv file


.csv files are stored in the dictionary directory keyed by year

I use a dictionary comprehension dataframes (again keyed by year) to store the .csv files as pandas dataframes

directory = {2009: 'path_to_file/data_2009.csv', ... , 2018: 'path_to_file/data_2018.csv'}

dataframes = {year: pandas.read_csv(file) for year, file in directory.items()}

# My Approach 1 
columns = [df.shape[1] for year, df in dataframes.items()]

# My Approach 2
columns = [dataframes[year].shape[1] for year in dataframes]

Which way is more "Pythonic"? Or is there a better way to approach this?

Upvotes: 4

Views: 450

Answers (4)

Shihe Zhang
Shihe Zhang

Reputation: 2771

import os
#use this to find files under certain dir, you can filter it if there are other files
target_files = os.listdir('path_to_file/')       
columns = list()
for filename in train_files:
    #in your scenario @piRSquared's answer would be more efficient.
    columns.append(#column_numbers) 

If you want columns with the key by year from the filename, you can filter the filename and update dictionary like this:

year = filename.replace(r'[^0-9]', '')

Upvotes: 2

privatevoid
privatevoid

Reputation: 192

Your Approach 2:

columns = [dataframes[year].shape[1] for year in dataframes]

is more Pythonic and concise with the future use of dataframes in merging, plotting, manipulating, etc.since the keys are implied in the comprehension and shape gives the number of columns

Upvotes: 4

theSanjeev
theSanjeev

Reputation: 149

You could use:

columns = [len(dataframe.columns) for dataframe in dataframes.values()]

As @piRSquared mentioned if your only objective is to get the number of columns in the dataframe you shouldn't read the entire csv file, instead use the nrows keyword argument of the read_csv function.

Upvotes: 3

piRSquared
piRSquared

Reputation: 294228

Your method will get it done... but I don't like reading in the entire file and creating a dataframe just to count the columns. You could do the same thing by just reading the first line of each file and counting the number of commas. Notice that I add 1 because there will always be one less comma than there are columns.

columns = [open(f).readline().count(',') + 1 for _, f in directory.items()]

Upvotes: 4

Related Questions