ShanZhengYang
ShanZhengYang

Reputation: 17621

Error filtering lists in pandas dataframe

Normally, one filters a pandas Dataframe as follows:

import pandas as pd
df = pd.read_csv(...)
df_filtered = df[df['column'] == value]

I have the following dataframe df1:

numbers    letters   other_columns
0          [A]     ....
1          [A]     ....
2          [C]     ....
3          [B]     ....
4          [B]     ....
5          [A]     ....
...        ....

I thought that the entries in letters were strings, but these are actually lists:

type(df1.letters.ix[0]) 

outputs list

So, I tried to filter the dataframe df1 to only have [A] rows.

That is only_A should look like:

numbers    letters   other_columns
0          [A]     ....
1          [A]     ....
5          [A]     ....
...        ....

However, if I try to filter with the code

only_A = df1[df1['letters'] == list('A')]

I get an error, a ValueError

ValueError: Arrays were different lengths: 3076 vs 1

What is the correct way to filter this dataframe?

Upvotes: 2

Views: 982

Answers (2)

root
root

Reputation: 33783

You can use Series.str.join to do the filtering without changing the DataFrame.

df[df['letters'].str.join('') == 'A']

Upvotes: 2

flyingmeatball
flyingmeatball

Reputation: 7997

If you thought the contents of letters were strings, could you convert the column of lists to strings? something like this:

df['letters'] = df['letters'].apply(lambda x: ''.join(x)) 

Then proceed to filter like you normally would

Upvotes: 2

Related Questions