Efficient/Pythonic way to create lists from pandas Dataframe column

Question

I've a dataframe as below.

df = pd.DataFrame({
    'code' : [1,   2,  3,  4,  5,  6,  7,  8,  9,  10],
    'Tag' :  ['A','B','C','D','B','C','D','A','D','C']
})

+------+-----+
| code | Tag |
+------+-----+
|   1  |  A  |
+------+-----+
|   2  |  B  |
+------+-----+
|   3  |  C  |
+------+-----+
|   4  |  D  |
+------+-----+
|   5  |  B  |
+------+-----+
|   6  |  C  |
+------+-----+
|   7  |  D  |
+------+-----+
|   8  |  A  |
+------+-----+
|   9  |  D  |
+------+-----+
|  10  |  C  |
+------+-----+

My objective is to create code lists based on the common items in the Tag column as below.

codes_A = [1,8]
codes_B = [2,5]
codes_C = [3,6,10]
codes_D = [4,7,9]

How I'm doing it right now is

codes_A = df[df['Tag'] == 'A']['code'].to_list()
codes_B = df[df['Tag'] == 'B']['code'].to_list()
codes_C = df[df['Tag'] == 'C']['code'].to_list()
codes_D = df[df['Tag'] == 'D']['code'].to_list()

This code does the job. But, as you can see this is very cumbersome and inefficient. I'm repeating the same code multiple times and also repeating when I want to create new lists.

is there a more efficient and pythonic way to do this in pandas or numpy?

jezrael · Accepted Answer

Create dictionary of list, becasue variable names are not recommended:

d = df.groupby('Tag')['code'].agg(list).to_dict()
print (d)
{'A': [1, 8], 'B': [2, 5], 'C': [3, 6, 10], 'D': [4, 7, 9]}

Then for list lookup by keys in dict, but no assign to variable name:

print (d['A'])
[1, 8]

So practically it means in your code if use codes_A then it change to d['A'], similar for all variables.

But if really need it:

for k, v in d.items():
    globals()[f'code_{k}'] = v
    
print (code_A)
[1, 8]

Efficient/Pythonic way to create lists from pandas Dataframe column

Answers (1)

Related Questions