user9733604
user9733604

Reputation:

How to create a dictionary from a text?

I am a python beginner, and I have several long texts formatted as lists and I would like to write a function that extracts the important information and returns me a dictionary. The texts are formatted like this:

['text', 'text', 'text', 'text', 'text','text', 'text', 'Country Code', '11111', 'Country Location', 'North', 'Date', '18-03-1878', text','text','text', 'Population': '1289028', 'text', 'text', 'Government', 'Monarchy', 'text', 'text', 'Religion:', 'Catholic']

I need specific information, such as country location, country code, and date. The thing is, the position of this strings vary from text to text, so I need a function that first finds this information in the text, put it as the key of my dictionary, and takes the next element on the text as the value. I was hoping to get an output like this:

{"Country Code": "11111", 
 "Country Location": "North", 
 "Date": "18-03-1878"
 "Population": "1289028"  
 "Religion:" "Catholic"}

I really appreciate any help you guys can provide.

Upvotes: 0

Views: 65

Answers (1)

Alexis Drakopoulos
Alexis Drakopoulos

Reputation: 1145

If you don't care about efficiency and the keys are consistent you can just write a loop.

your_list = ['text', 'text', 'text', 'text', 'text','text', 'text', 'Country Code', '11111', 'Country Location', 'North', 'Date', '18-03-1878', 'text','text','text', 'Population', '1289028', 'text', 'text', 'Government', 'Monarchy', 'text', 'text', 'Religion:', 'Catholic']

our_dict = {}

for idx, word in enumerate(your_list):
    if 'Country Code' in word:
        our_dict['Country Code'] = your_list[idx+1]
    if 'Country Location' in word:
        our_dict['Country Location'] = your_list[idx+1]
    if 'Date' in word:
        our_dict['Date'] = your_list[idx+1]
    if 'Population' in word:
        our_dict['Population'] = your_list[idx+1]
    if 'Religion' in word:
        our_dict['Religion'] = your_list[idx+1]

to deal with your other issue of empty cells in your list you can do:

for idx, word in enumerate(your_list):
    if len(word.strip(' ')) > 0:
        if 'Country Code' in word:
            our_dict['Country Code'] = your_list[idx+1]
        if 'Country Location' in word:
            our_dict['Country Location'] = your_list[idx+1]
        if 'Date' in word:
            our_dict['Date'] = your_list[idx+1]
        if 'Population' in word:
            our_dict['Population'] = your_list[idx+1]
        if 'Religion' in word:
            our_dict['Religion'] = your_list[idx+1]

Shorter solution:

#Create a list of items you are interested in (this is a set - only uniques)
itemstofind = {'Country Code', 'Country Location', 'Date', 'Population', 'Religion:'}

# use a dict comprehension to find the items and take next item in the list
# assumes there is no error in the data
d = {item:longlist[ind+1] for ind, item in enumerate(longlist) if item in itemstofind}

Upvotes: 1

Related Questions