Reputation: 7703
I have a dataframe like as shown below
td = {966: [('Feat1', -0.04),
('Feat2=True ', -0.02),
('Feat3 <= 20000.00', 0.01),
('Feat4=Power Supply', -0.01),
('Feat5=dada', -0.0)],
879: [('Feat8=Rare', 0.02),
('Feat11=HV', -0.01),
('Feat21=Power Supply', -0.01),
('20000.00 < Feat3 <= 50000.00', 0.01),
('Feat5=dada', -0.01)]}
I would like to do the below
a) Split the tuple within dict based on ,
comma seperator
b) store the numeric part in value
column of dataframe and text part in feature
column of dataframe
c) repeat the key values for all values in dataframe (and store it in key
column)
I tried the below but it is not efficient/elegant and doesn't scale for big data of million rows
feature=[]
value=[]
key=[]
for k, v in td.items():
for x in v:
key.append(k)
f, v = x
feature.append(f)
value.append(v)
data_tuples = list(zip(key,feature,value))
pd.DataFrame(data_tuples, columns=['key','feature','value'])
I expect my output to be like as shown below
Upvotes: 1
Views: 245
Reputation: 862591
Use generator comprehension with flatten values and pass to DataFrame
constructor:
df = pd.DataFrame()(k,b,c) for k, v in td.items() for b, c in v),
columns=['key','feature','value'])
print (df)
key feature value
0 966 Feat1 -0.04
1 966 Feat2=True -0.02
2 966 Feat3 <= 20000.00 0.01
3 966 Feat4=Power Supply -0.01
4 966 Feat5=dada -0.00
5 879 Feat8=Rare 0.02
6 879 Feat11=HV -0.01
7 879 Feat21=Power Supply -0.01
8 879 20000.00 < Feat3 <= 50000.00 0.01
9 879 Feat5=dada -0.01
Upvotes: 1
Reputation: 148890
You can even use a generator comprehension for the data to avoid building a full list in memory:
pd.DataFrame(([k, elt[0], elt[1]] for k,v in td.items() for elt in v),
columns = ['key', 'Feature', 'Value'])
key Feature Value
0 966 Feat1 -0.04
1 966 Feat2=True -0.02
2 966 Feat3 <= 20000.00 0.01
3 966 Feat4=Power Supply -0.01
4 966 Feat5=dada -0.00
5 879 Feat8=Rare 0.02
6 879 Feat11=HV -0.01
7 879 Feat21=Power Supply -0.01
8 879 20000.00 < Feat3 <= 50000.00 0.01
9 879 Feat5=dada -0.01
Upvotes: 2