Reputation: 4241
New to pandas python.
I have a dataframe (df) with two columns of cusips. I want to turn those columns into a list of the unique entries of the two columns.
My first attempt was to do the following:
cusips = pd.concat(df['long'], df['short']).
This returned the error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
I have read a few postings, but I am still having trouble with why this comes up. What am I missing here?
Also, what's the most efficient way to select the unique entries in a column or a dataframe? Can I call it in one function? Does the function differ if I want to create a list or a new, one-coulmn dataframe?
Thank you.
Upvotes: 1
Views: 2380
Reputation: 375435
To obtain the unique values in a column you can use the unique
Series method, which will return a numpy array of the unique values (and it is fast!).
df.long.unique()
# returns numpy array of unique values
You could then use numpy.append
:
np.append(df.long.unique(), df.short.unique())
Note: This just appends the two unique results together and so itself is not unique!
.
Here's a (trivial) example:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 2], [1, 4]], columns=['long','short'])
In [4]: df
Out[4]:
long short
0 1 2
1 1 4
In [5]: df.long.unique()
Out[5]: array([1])
In [6]: df.short.unique()
Out[6]: array([2, 4])
And then appending the resulting two arrays:
In [7]: np.append(df.long.unique(), df.short.unique())
Out[7]: array([1, 2, 4])
Using @Zalazny7's set
is significantly faster (since it runs over the array only once) and somewhat upsettingly it's even faster than np.unique
(which sorts the resulting array!).
Upvotes: 1
Reputation: 40628
Adding to Hayden's answer, you could also use the set()
method for the same result. The performance is slightly better if that's a consideration:
In [28]: %timeit set(np.append(df[0],df[1]))
100000 loops, best of 3: 19.6 us per loop
In [29]: %timeit np.append(df[0].unique(), df[1].unique())
10000 loops, best of 3: 55 us per loop
Upvotes: 1