Reputation: 1863
I'm trying to write something that answers "what are the possible values in every column?"
I created a dictionary called all_col_vals
and iterate from 1 to however many columns my dataframe has. However, when reading about this online, someone stated this looked too much like Java and the more pythonic way would be to use zip. I can't see how I could use zip here.
all_col_vals = {}
for index in range(RCSRdf.shape[1]):
all_col_vals[RCSRdf.iloc[:,index].name] = set(RCSRdf.iloc[:,index])
The output looks like 'CFN Network': {nan, 'N521', 'N536', 'N401', 'N612', 'N204'}, 'Exam': {'EXRC', 'MXRN', 'HXRT', 'MXRC'}
and shows all the possible values for that specific column. The key is the column name.
Upvotes: 3
Views: 161
Reputation: 3305
I think @piRSquared's comment is the best option, so I'm going to steal it as an answer and add some explanation.
Assuming you don't have duplicate columns, use the following:
{k : {*df[k]} for k in df}
k
represents a column name in df
. You don't have to use the .columns
attribute to access them because a pandas.DataFrame
works similarly to a python
dict
df[k]
represents the series k
{*df[k]}
unpacks the values from the series and places them in a set ({}
) which only keeps distinct elements by definition (see definition of a set).
Lastly, using list comprehension to create the dict
is faster than defining an empty dict
and adding new keys to it via a for-loop
.
Upvotes: 10