Reputation: 3038
I have a data which looks like below
data = [[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326), ('D', 217.464281998), ('E', 206.329901299)], [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]]
This is just a small part of the data that I have extracted. As you can see, there is no value available for K
. So I thought. maybe I could use pandas to fix this. So I do this
import pandas as pd
import numpy as np
df = pd.Dataframe(data).fillna(0)
Now I cannot use df.fillna(0)
since there is no None
in the data.
So I tried df.replace(r'^\s*$', np.nan, regex=True)
which would remove any empty string with a None
but even this didn't help.
So what can I do to fill the missing data?
Note: It is not necessary that I will always receive data in this format. I may also receive in this format as well
data = [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
What I am looking for is a generic solution in pandas to fill the missing values.
Upvotes: 3
Views: 821
Reputation: 75120
IIUC, you may have either a list or a list of list, if so, try a func:
data1=[[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326),
('D', 217.464281998), ('E', 206.329901299)], [('F', 210.297625953),
('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]]
data2 = [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
import itertools
def myfunc(x):
if type(x[0])==list:
return pd.DataFrame(itertools.chain.from_iterable(x)).fillna(0)
else:
return pd.DataFrame(x).fillna(0)
print(myfunc(data1))
0 1
0 A 204.593565
1 B 217.421341
2 C 237.296250
3 D 217.464282
4 E 206.329901
5 F 210.297626
6 G 228.117693
7 H 4.000000
8 I 265.319671
9 K 0.000000
print(myfunc(data2))
0 1
0 F 210.297626
1 G 228.117693
2 H 4.000000
3 I 265.319671
4 K 0.000000
Upvotes: 1
Reputation: 3770
use pd.applymap
df.applymap(lambda x: (x[0],0) if len(x) == 1 else x)
0 1 2 \
0 (A, 204.593564568) (B, 217.421341061) (C, 237.296250326)
1 (F, 210.297625953) (G, 228.117692718) (H, 4)
3 4
0 (D, 217.464281998) (E, 206.329901299)
1 (I, 265.319671257) (K, 0)
Alternative..since edit
why dont you flatten your tuples, see below (using np.flatten)
data = list(np.array(data).flatten()) #since it can be list of list or a list
##data
[('A', 204.593564568),
('B', 217.421341061),
('C', 237.296250326),
('D', 217.464281998),
('E', 206.329901299),
('F', 210.297625953),
('G', 228.117692718),
('H', 4),
('I', 265.319671257),
('K',)]
and then,
pd.DataFrame(data).fillna(0)
0 1
0 A 204.593565
1 B 217.421341
2 C 237.296250
3 D 217.464282
4 E 206.329901
5 F 210.297626
6 G 228.117693
7 H 4.000000
8 I 265.319671
9 K 0.000000
Upvotes: 4
Reputation: 92884
Here you go:
In [299]: data = [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
In [300]: pd.DataFrame(data).fillna(0).to_records(index=False).tolist()
Out[300]:
[('F', 210.297625953),
('G', 228.117692718),
('H', 4.0),
('I', 265.319671257),
('K', 0.0)]
For the case with nested lists:
In [308]: data = [[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326), ('D', 217.464281998), ('E',
...: 206.329901299)], [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]
...: ]
In [309]: from itertools import chain
In [310]: pd.DataFrame(chain.from_iterable(data)).fillna(0).to_records(index=False).tolist()
Out[310]:
[('A', 204.593564568),
('B', 217.421341061),
('C', 237.296250326),
('D', 217.464281998),
('E', 206.329901299),
('F', 210.297625953),
('G', 228.117692718),
('H', 4.0),
('I', 265.319671257),
('K', 0.0)]
Upvotes: 2
Reputation: 7812
If I understand your problem properly you can add None
using next list comprehension:
data = [[('A', 204.593564568), ('B', 217.421341061), ('C', 237.296250326), ('D', 217.464281998), ('E', 206.329901299)], [('F', 210.297625953), ('G', 228.117692718), ('H', 4), ('I', 265.319671257), ('K',)]]
new_data = [[t if len(t) == 2 else (*t, None) for t in l] for l in data]
Upvotes: 1