Reputation: 25997
I have a dictionary like this:
mydict = {'A': 'some thing',
'B': 'couple of words'}
All the values are strings that are separated by white spaces. My goal is to convert this into a dataframe which looks like this:
key_val splitted_words
0 A some
1 A thing
2 B couple
3 B of
4 B words
So I want to split the strings and then add the associated key and these words into one row of the dataframe.
A quick implementation could look like this:
import pandas as pd
mydict = {'A': 'some thing',
'B': 'couple of words'}
all_words = " ".join(mydict.values()).split()
df = pd.DataFrame(columns=['key_val', 'splitted_words'], index=range(len(all_words)))
indi = 0
for item in mydict.items():
words = item[1].split()
for word in words:
df.iloc[indi]['key_val'] = item[0]
df.iloc[indi]['splitted_words'] = word
indi += 1
which gives me the desired output.
However, I am wondering whether there is a more efficient solution to this!?
Upvotes: 2
Views: 104
Reputation: 3335
Based on @qu-dong's idea and using a generator function for readability a working example:
#! /usr/bin/env python
from __future__ import print_function
import pandas as pd
mydict = {'A': 'some thing',
'B': 'couple of words'}
def splitting_gen(in_dict):
"""Generator function to split in_dict items on space."""
for k, v in in_dict.items():
for s in v.split():
yield k, s
df = pd.DataFrame(splitting_gen(mydict), columns=['key_val', 'splitted_words'])
print (df)
# key_val splitted_words
# 0 A some
# 1 A thing
# 2 B couple
# 3 B of
# 4 B words
# real 0m0.463s
# user 0m0.387s
# sys 0m0.057s
but this only caters efficiency in elegance/readability of the solution requested.
If you note the timings they are all alike approx. a tad shorted than 500 milli seconds. So one might continue to profile further to not suffer when feeding in larger texts ;-)
Upvotes: 4
Reputation: 1809
Here is my on-line approach:
df = pd.DataFrame([(k, s) for k, v in mydict.items() for s in v.split()], columns=['key_val','splitted_words'])
If I split it, it will be:
d=[(k, s) for k, v in mydict.items() for s in v.split()]
df = pd.DataFrame(d, columns=['key_val','splitted_words'])
Output:
Out[41]:
key_val splitted_words
0 A some
1 A thing
2 B couple
3 B of
4 B words
Upvotes: 4