Brian
Brian

Reputation: 81

Creating a dictionary where the key is an integer and the value is the length of a random sentence

Super new to to python here, I've been struggling with this code for a while now. Basically the function returns a dictionary with the integers as keys and the values are all the words where the length of the word corresponds with each key.

So far I'm able to create a dictionary where the values are the total number of each word but not the actual words themselves.

So passing the following text

"the faith that he had had had had an affect on his life"

to the function

def get_word_len_dict(text):
    result_dict = {'1':0, '2':0, '3':0, '4':0, '5':0, '6' :0}
    for word in text.split():
        if str(len(word)) in result_dict:
            result_dict[str(len(word))] += 1
    return result_dict

returns

1 - 0
2 - 3
3 - 6
4 - 2
5 - 1
6 - 1

Where I need the output to be:

2 - ['an', 'he', 'on']
3 - ['had', 'his', 'the']
4 - ['life', 'that']
5 - ['faith']
6 - ['affect']

I think I need to have to return the values as a list. But I'm not sure how to approach it.

Upvotes: 4

Views: 25850

Answers (8)

blackeneth
blackeneth

Reputation: 341

Your code is counting the occurrence of each word length - but not storing the words themselves.

In addition to capturing each word into a list of words with the same size, you also appear to want:

  1. If a word length is not represented, do not return an empty list for that length - just don't have a key for that length.
  2. No duplicates in each word list
  3. Each word list is sorted

A set container is ideal for accumulating the words - sets naturally eliminate any duplicates added to them.

Using defaultdict(sets) will setup an empty dictionary of sets -- a dictionary key will only be created if it is referenced in our loop that examines each word.

from collections import defaultdict 

def get_word_len_dict(text):

    #create empty dictionary of sets 
    d = defaultdict(set)

    # the key is the length of each word
    # The value is a growing set of words
    # sets automatically eliminate duplicates
    for word in text.split():
        d[len(word)].add(word)

    # the sets in the dictionary are unordered
    # so sort them into a new dictionary, which is returned
    # as a dictionary of lists

    return {i:sorted(d[i]) for i in d.keys()}

In your example string of

a="the faith that he had had had had an affect on his life"

Calling the function like this:

z=get_word_len_dict(a)

Returns the following list:

print(z)
{2: ['an', 'he', 'on'], 3: ['had', 'his', 'the'], 4: ['life', 'that'], 5: ['faith'], 6: ['affect']}

The type of each value in the dictionary is "list".

print(type(z[2]))
<class 'list'>

Upvotes: 2

dkasak
dkasak

Reputation: 2703

You say you want the keys to be integers but then you convert them to strings before storing them as a key. There is no need to do this in Python; integers can be dictionary keys.

Regarding your question, simply initialize the values of the keys to empty lists instead of the number 0. Then, in the loop, append the word to the list stored under the appropriate key (the length of the word), like this:

string = "the faith that he had had had had an affect on his life"

def get_word_len_dict(text):
    result_dict = {i : [] for i in range(1, 7)}
    for word in text.split():
        length = len(word)
        if length in result_dict:
            result_dict[length].append(word)
    return result_dict      

This results in the following:

>>> get_word_len_dict(string)
{1: [], 2: ['he', 'an', 'on'], 3: ['the', 'had', 'had', 'had', 'had', 'his'], 4: ['that', 'life'], 5: ['faith'], 6: ['affect']}

If you, as you mentioned, wish to remove the duplicate words when collecting your input string, it seems elegant to use a set and convert to a list as a final processing step, if this is needed. Also note the use of defaultdict so you don't have to manually initialize the dictionary keys and values as a default value set() (i.e. the empty set) gets inserted for each key that we try to access but not others:

from collections import defaultdict

string = "the faith that he had had had had an affect on his life"

def get_word_len_dict(text):
    result_dict = defaultdict(set)
    for word in text.split():
        length = len(word)
        result_dict[length].add(word)
    return {k : list(v) for k, v in result_dict.items()}

This gives the following output:

>>> get_word_len_dict(string)
{2: ['he', 'on', 'an'], 3: ['his', 'had', 'the'], 4: ['life', 'that'], 5: ['faith'], 6: ['affect']}

Upvotes: 2

Cavaz
Cavaz

Reputation: 3119

the problem here is you are counting the word by length, instead you want to group them. You can achieve this by storing a list instead of a int:

def get_word_len_dict(text):
    result_dict = {}
    for word in text.split():
        if len(word) in result_dict:
            result_dict[len(word)].add(word)
        else:
            result_dict[len(word)] = {word} #using a set instead of list to avoid duplicates
    return result_dict

Other improvements:

  • don't hardcode the key in the initialized dict but let it empty instead. Let the code add the new keys dynamically when necessary
  • you can use int as keys instead of strings, it will save you the conversion
  • use sets to avoid repetitions

Using groupby

Well, I'll try to propose something different: you can group by length using groupby from the python standard library

import itertools
def get_word_len_dict(text):
    # split and group by length (you get a list if tuple(key, list of values)
    groups = itertools.groupby(sorted(text.split(), key=lambda x: len(x)), lambda x: len(x))
    # convert to a dictionary with sets 
    return {l: set(words) for l, words in groups}

Upvotes: 2

MasterControlProgram
MasterControlProgram

Reputation: 379

What you need is a map to list-construct (if not many words, otherwise a 'Counter' would be fine): Each list stands for a word class (number of characters). Map is checked whether word class ('3') found before. List is checked whether word ('had') found before.

def get_word_len_dict(text):
    result_dict = {}
    for word in text.split():
        if not result_dict.get(str(len(word))): # add list to map?
            result_dict[str(len(word))] = []

        if not word in result_dict[str(len(word))]: # add word to list?
            result_dict[str(len(word))].append(word)

    return result_dict

-->

3 ['the', 'had', 'his']
2 ['he', 'an', 'on']
5 ['faith']
4 ['that', 'life']
6 ['affect']

Upvotes: 2

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48120

Instead of defining the default value as 0, assign it as set() and within if condition do, result_dict[str(len(word))].add(word).

Also, instead of preassigning result_dict, you should use collections.defaultdict.

Since you need non-repetitive words, I am using set as value instead of list.

Hence, your final code should be:

from collections import defaultdict
def get_word_len_dict(text):
    result_dict = defaultdict(set)
    for word in text.split():
        result_dict[str(len(word))].add(word)
    return result_dict

In case it is must that you want list as values (I think set should suffice your requirement), you need to further iterate it as:

for key, value in result_dict.items():
    result_dict[key] = list(value)

Upvotes: 2

tadek
tadek

Reputation: 100

Check out list comprehensions

Integers are legal dictionaries keys so there is no need to make the numbers strings unless you want it that way for some other reason. if statement in the for loop controls flow to add word only once. You could get this effect more automatically if you use set() type instead of list() as your value data structure. See more in the docs. I believe the following does the job:

def get_word_len_dict(text):
    result_dict = {len(word) : [] for word in text.split()}
    for word in text.split():
        if word not in result_dict[len(word)]:
            result_dict[len(word)].append(word) 
    return result_dict

try to make it better ;)

Upvotes: 2

Paul Cornelius
Paul Cornelius

Reputation: 11008

Fixing Sabian's answer so that duplicates aren't added to the list:

def get_word_len_dict(text):
    result_dict = {1:[], 2:[], 3:[], 4:[], 5:[], 6 :[]}
    for word in text.split():
        n = len(word)
        if n in result_dict and word not in result_dict[n]:
            result_dict[n].append(word)
    return result_dict

Upvotes: 2

Sabian
Sabian

Reputation: 155

I think that what you want is a dic of lists.

result_dict = {'1':[], '2':[], '3':[], '4':[], '5':[], '6' :[]}
for word in text.split():
    if str(len(word)) in result_dict:
        result_dict[str(len(word))].append(word)
return result_dict

Upvotes: 2

Related Questions