Two Columns of a pandas dataframe - Concat in Python

Question

New to pandas python.

I have a dataframe (df) with two columns of cusips. I want to turn those columns into a list of the unique entries of the two columns.

My first attempt was to do the following:

cusips = pd.concat(df['long'], df['short']).

This returned the error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

I have read a few postings, but I am still having trouble with why this comes up. What am I missing here?

Also, what's the most efficient way to select the unique entries in a column or a dataframe? Can I call it in one function? Does the function differ if I want to create a list or a new, one-coulmn dataframe?

Thank you.

Zelazny7 · Accepted Answer

Adding to Hayden's answer, you could also use the set() method for the same result. The performance is slightly better if that's a consideration:

In [28]: %timeit set(np.append(df[0],df[1]))
100000 loops, best of 3: 19.6 us per loop

In [29]: %timeit np.append(df[0].unique(), df[1].unique())
10000 loops, best of 3: 55 us per loop

Two Columns of a pandas dataframe - Concat in Python

Answers (2)

Related Questions