TJE
TJE

Reputation: 580

How to reference a value in a dictionary if the value is a list of strings?

I'm collecting data from Twitter, and each tweet is in the form of a dictionary.

My full dataset is a list of thousands of tweets (a list of dictionaries).

I want to reference the hashtags within each tweet but I need help figuring out how to do this.

Here's an example of a list of two partial tweets with relevant data:

twitter_tweets =  
[{'created_at': 'Wed Oct 18 22:20:30 +0000 2017', 'id': 920776631102214144, 'entities': {'hashtags': ['#dataanalyst#', '#politics']} 'user': {'id': 119116331, 'statuses_count': 32796, 'verified': False, 'lang': 'en-'}, 'retweet_count': 0, 'favorite_count': 0}
{'created_at': 'Wed Oct 17 12:20:36 +0000 2017', 'id': 920776631106514144, 'entities': {'hashtags': ['#california', '#nationalparks']}  'user': {'id': 119159331, 'statuses_count': 32796, 'verified': False, 'lang': 'en-gb'}, 'retweet_count': 1, 'favorite_count': 2}]

Notice that the "entities" key has as its value a second dictionary. In that second dictionary, "hashtags" is the key and the value is a list of hashtags.

Here is the code I have attempting to collect a list of all of these hashtags to create a frequency series:

def make_tweets_series(input_list, first_key, second_key):
    final_keys_list = []
    for line in input_list:
        tweets_by_key = line[first_key][second_key]
        final_keys_list.append(tweets_by_key)
        series_key_values = pd.Series(final_keys_list).value_counts()

    return series_key_values


hashtag_distribution_series = make_tweets_series(twitter_tweets, 'entities', 'hashtags')

This code would work, I think, if the "hashtags" value was a string, but it doesn't work because "hashtags" is a list of strings.

How can I reference each of the hashtags in these lists and put them into a Series?

My full error message, with the traceback, is as follows:

Traceback (most recent call last):

  File "<ipython-input-60-7623feb35c84>", line 13, in <module>
    hashtag_distribution_series = make_tweets_series(twitter_tweets, 'entities', 'hashtags')

  File "<ipython-input-60-7623feb35c84>", line 6, in make_tweets_series
    series_key_values = pd.Series(final_keys_list).value_counts()

  File "/home/tommy/anaconda3/lib/python3.6/site-packages/pandas/core/base.py", line 938, in value_counts
    normalize=normalize, bins=bins, dropna=dropna)

  File "/home/tommy/anaconda3/lib/python3.6/site-packages/pandas/core/algorithms.py", line 640, in value_counts
    keys, counts = _value_counts_arraylike(values, dropna)

  File "/home/tommy/anaconda3/lib/python3.6/site-packages/pandas/core/algorithms.py", line 685, in _value_counts_arraylike
    keys, counts = f(values, dropna)

  File "pandas/_libs/hashtable_func_helper.pxi", line 356, in pandas._libs.hashtable.value_count_object (pandas/_libs/hashtable.c:29440)

  File "pandas/_libs/hashtable_func_helper.pxi", line 367, in pandas._libs.hashtable.value_count_object (pandas/_libs/hashtable.c:29189)

TypeError: unhashable type: 'list'

Upvotes: 1

Views: 89

Answers (1)

hyperneutrino
hyperneutrino

Reputation: 5425

list is unhashable

Literally what it means; you cannot hash a list object. dicts use the hash value of objects to find key -> value; it's faster that way.

Use tuple instead, so where the list of strings is being returned, just call tuple(...). Tuples are immutable ordered collections that behave similarly to lists except are immutable and hashable.

Upvotes: 2

Related Questions