Cleb
Cleb

Reputation: 25997

How to efficiently convert the entries of a dictionary into a dataframe

I have a dictionary like this:

mydict = {'A': 'some thing',
          'B': 'couple of words'}

All the values are strings that are separated by white spaces. My goal is to convert this into a dataframe which looks like this:

  key_val splitted_words
0       A           some
1       A          thing
2       B         couple
3       B             of
4       B          words

So I want to split the strings and then add the associated key and these words into one row of the dataframe.

A quick implementation could look like this:

import pandas as pd

mydict = {'A': 'some thing',
          'B': 'couple of words'}

all_words = " ".join(mydict.values()).split()
df = pd.DataFrame(columns=['key_val', 'splitted_words'], index=range(len(all_words)))

indi = 0
for item in mydict.items():
    words = item[1].split()
    for word in words:
        df.iloc[indi]['key_val'] = item[0]
        df.iloc[indi]['splitted_words'] = word
        indi += 1

which gives me the desired output.

However, I am wondering whether there is a more efficient solution to this!?

Upvotes: 2

Views: 104

Answers (2)

Dilettant
Dilettant

Reputation: 3335

Based on @qu-dong's idea and using a generator function for readability a working example:

#! /usr/bin/env python
from __future__ import print_function
import pandas as pd

mydict = {'A': 'some thing',
          'B': 'couple of words'}


def splitting_gen(in_dict):
    """Generator function to split in_dict items on space."""
    for k, v in in_dict.items():
        for s in v.split():
            yield k, s

df = pd.DataFrame(splitting_gen(mydict), columns=['key_val', 'splitted_words'])
print (df)

#   key_val splitted_words
# 0       A           some
# 1       A          thing
# 2       B         couple
# 3       B             of
# 4       B          words

# real    0m0.463s
# user    0m0.387s
# sys     0m0.057s

but this only caters efficiency in elegance/readability of the solution requested.

If you note the timings they are all alike approx. a tad shorted than 500 milli seconds. So one might continue to profile further to not suffer when feeding in larger texts ;-)

Upvotes: 4

2342G456DI8
2342G456DI8

Reputation: 1809

Here is my on-line approach:

df = pd.DataFrame([(k, s) for k, v in mydict.items() for s in v.split()], columns=['key_val','splitted_words'])

If I split it, it will be:

d=[(k, s) for k, v in mydict.items() for s in v.split()]
df = pd.DataFrame(d, columns=['key_val','splitted_words'])

Output:

Out[41]: 
  key_val splitted_words
0       A           some
1       A          thing
2       B         couple
3       B             of
4       B          words

Upvotes: 4

Related Questions