twhale
twhale

Reputation: 755

How to solve ValueError when testing truth value of Dataframe contents? Python

I have a Dataframe that looks like this.

   done    sentence                        3_tags
0  0       ['What', 'were', 'the', '...]   ['WP', 'VBD', 'DT']
1  0       ['What', 'was', 'the', '...]    ['WP', 'VBD', 'DT']
2  0       ['Why', 'did', 'John', '...]    ['WP', 'VBD', 'NN']
...

For each row I want to check if the list in column '3_tags' is on a list temp1, as follows:

a = pd.read_csv('sentences.csv')
temp1 = [ ['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT'] ]
q = a['3_tags'] 
q in temp1

For the first sentence in row 0, the value of '3_tags' = ['WP', 'VBD', 'DT'] which is in temp1 so I expect the result of the above to be:

True

However, I get this error:

ValueError: Arrays were different lengths: 1 vs 3

I suspect that there is some problem with the datatype of q:

print(type(q))
<class 'pandas.core.series.Series'>

Is the problem that q is a Series and temp1 contains lists? What should I do to get the logical result 'True' ?

Upvotes: 0

Views: 64

Answers (1)

piRSquared
piRSquared

Reputation: 294328

You want those lists to be tuples instead.
Then use pd.Series.isin

*temp1, = map(tuple, temp1)

q = a['3_tags'].apply(tuple)

q.isin(temp1)

0     True
1     True
2    False
Name: 3_tags, dtype: bool

However, it appears that the '3_tags' column consists of strings that look like lists. In this case, we want to parse them with ast.literal_eval

from ast import literal_eval

*temp1, = map(tuple, temp1)

q = a['3_tags'].apply(lambda x: tuple(literal_eval(x)))

q.isin(temp1)

0     True
1     True
2    False
Name: 3_tags, dtype: bool

Setup1

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN')))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

Setup2

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str, map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN'))))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

Upvotes: 1

Related Questions