Reputation: 71
I have troubles splitting a string in a dictionary into multiple lines in a DataFrame for one key. So far i couldn't find a proper solution. Any help is appreciated.
The following code can split the string into one line:
d_new = {k: dict(map(str.strip, x.split('||')) for x in v) for k, v in d.items()}
df = pd.DataFrame.from_dict(d_new, orient='index')
My dictionary d looks like this :
{'Key1': ['A||1234', 'A||1235', 'A||1236', 'B||4567', 'C||78910'],
'Key2': ['A||165', 'A||135', 'B||888', 'B||1111']}
I want to split such that Key1 has 3 lines (for the three different arguments for A) and Key2 has 2 lines.
Desired Output:
Key|A|B|C
Key1|1234|4567|78910
Key1|1235|4567|78910
Key1|1236|4567|78910
Key2|165|888|
Key2|135|1111|
Edit1: I'm sorry, i dont know how to make a table here. I added the desired output as good as i could.
Upvotes: 1
Views: 200
Reputation: 164773
The problem is you need to construct a dataframe for each dictionary list of values. Here's a solution using collections.defaultdict
:
d = {'Key1': ['A||1234', 'A||1235', 'A||1236', 'B||4567', 'C||78910'],
'Key2': ['A||165', 'A||135', 'B||888', 'B||1111']}
from collections import defaultdict
def create_dataframe(k, x):
dd = defaultdict(list)
for item in x:
key, value = item.split('||')
dd[key].append(value)
return pd.DataFrame.from_dict(dd, orient='index').T.assign(Key=k).ffill()
df = pd.concat(create_dataframe(*item) for item in d.items())
print(df)
A B C Key
0 1234 4567 78910 Key1
1 1235 4567 78910 Key1
2 1236 4567 78910 Key1
0 165 888 NaN Key2
1 135 1111 NaN Key2
Upvotes: 1