Reputation: 17621
Normally, one filters a pandas Dataframe as follows:
import pandas as pd
df = pd.read_csv(...)
df_filtered = df[df['column'] == value]
I have the following dataframe df1
:
numbers letters other_columns
0 [A] ....
1 [A] ....
2 [C] ....
3 [B] ....
4 [B] ....
5 [A] ....
... ....
I thought that the entries in letters
were strings, but these are actually lists:
type(df1.letters.ix[0])
outputs list
So, I tried to filter the dataframe df1
to only have [A]
rows.
That is only_A
should look like:
numbers letters other_columns
0 [A] ....
1 [A] ....
5 [A] ....
... ....
However, if I try to filter with the code
only_A = df1[df1['letters'] == list('A')]
I get an error, a ValueError
ValueError: Arrays were different lengths: 3076 vs 1
What is the correct way to filter this dataframe?
Upvotes: 2
Views: 982
Reputation: 33783
You can use Series.str.join
to do the filtering without changing the DataFrame.
df[df['letters'].str.join('') == 'A']
Upvotes: 2
Reputation: 7997
If you thought the contents of letters were strings, could you convert the column of lists to strings? something like this:
df['letters'] = df['letters'].apply(lambda x: ''.join(x))
Then proceed to filter like you normally would
Upvotes: 2