Reputation: 477
I have some code, where the following, say, are the columns of my df.
df.columns = ['A1', 'A2', 'B1', 'B2', 'C1', 'C2', 'D1', 'D2', 'E1', 'E2']
list = df.columns.str[:1]
list = np.unique(list)
I am trying to get the unique values of the letters, and numbers, but in the correct order.
My code doesn't maintain the ordering and I cant figure out how to do so.
Thank you
expected output:
letters = [A, B, C, D, E]
numbers = [1, 2]
Upvotes: 1
Views: 92
Reputation: 21674
This one uses regex and would continue working in case you have multiple characters/numbers in your column names:
import re
import pandas as pd
df = pd.DataFrame(columns=['EE2', 'A1', 'A2', 'B1', 'B2', 'C1', 'C2', 'D1', 'D11', 'E1'])
split_ = [re.findall('\d+|\D+', col) for col in df.columns]
list(pd.Series([col[0] for col in split_]).drop_duplicates())
# ['EE', 'A', 'B', 'C', 'D', 'E']
list(pd.Series([col[1] for col in split_]).drop_duplicates())
# ['2', '1', '11']
Upvotes: 1
Reputation: 12157
Assuming your example is representative, you can use a neat little trick that I got from Raymond Hettinger. In python 3.6 and later, dicts are ordered so you can use their keys as efficient ordered sets.
list(dict.fromkeys(c[0] for c in df.columns))
# --> ['A', 'B', 'C', 'D', 'E']
list(dict.fromkeys(int(c[1]) for c in df.columns))
# --> [1, 2]
Upvotes: 2
Reputation: 164773
You can use toolz.unique
instead. This is identical to the unique_everseen
recipe found in the itertools
docs. Internally, it iterates while maintaining a set
of seen items.
df = pd.DataFrame(columns=['A1', 'A2', 'B1', 'B2', 'C1', 'C2', 'D1', 'D2', 'E1', 'E2'])
from toolz import unique
res = list(unique(df.columns.str[:1]))
['A', 'B', 'C', 'D', 'E']
A more Pandorable solution would be to convert the Index
object to pd.Series
and use drop_duplicates
. This, again, uses hashing:
res = df.columns.str[:1].to_series().drop_duplicates().values
array(['A', 'B', 'C', 'D', 'E'], dtype=object)
Upvotes: 1