Christina Zhou
Christina Zhou

Reputation: 1863

Pythonic way to create a dictionary by iterating

I'm trying to write something that answers "what are the possible values in every column?"

I created a dictionary called all_col_vals and iterate from 1 to however many columns my dataframe has. However, when reading about this online, someone stated this looked too much like Java and the more pythonic way would be to use zip. I can't see how I could use zip here.

all_col_vals = {}
for index in range(RCSRdf.shape[1]):
    all_col_vals[RCSRdf.iloc[:,index].name] = set(RCSRdf.iloc[:,index])

The output looks like 'CFN Network': {nan, 'N521', 'N536', 'N401', 'N612', 'N204'}, 'Exam': {'EXRC', 'MXRN', 'HXRT', 'MXRC'} and shows all the possible values for that specific column. The key is the column name.

Upvotes: 3

Views: 161

Answers (1)

Ian Thompson
Ian Thompson

Reputation: 3305

I think @piRSquared's comment is the best option, so I'm going to steal it as an answer and add some explanation.

Answer

Assuming you don't have duplicate columns, use the following:

{k : {*df[k]} for k in df}

Explanation

k represents a column name in df. You don't have to use the .columns attribute to access them because a pandas.DataFrame works similarly to a python dict

df[k] represents the series k

{*df[k]} unpacks the values from the series and places them in a set ({}) which only keeps distinct elements by definition (see definition of a set).

Lastly, using list comprehension to create the dict is faster than defining an empty dict and adding new keys to it via a for-loop.

Upvotes: 10

Related Questions