Reputation: 588
I have a pandas dataframe df
import numpy as np
import pandas as pd
df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
"type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
"F_ID" :["0", "[7 8 9]", "[10]", "0", "[2]", "0", "0", "0", "0"]})
# convert the string representations of list structures to actual lists
F_ID_as_series_of_lists = df["F_ID"].str.replace("[","").str.replace("]","").str.split(" ")
#type(F_ID_as_series_of_lists) is pd.Series, make it a list for pd.DataFrame.from_records
F_ID_as_records = list(F_ID_as_series_of_lists)
f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)
I am getting an error in the line:
f_id_df = pd.DataFrame.from_records(list(F_ID_as_records)).fillna(np.nan)
Error is: TypeError: object of type 'float' has no len()
how can i solve this ?
Upvotes: 3
Views: 5304
Reputation: 216
There is another way using list comprehensions and utilizing what we've learned from the type error itself.
Say that you have a pandas series that is a string data type, and you want to split the column into two parts given the '/' symbol, but but not all columns are populated.
pd.DataFrame({'TEXT_COLUMN' : ['12/4', '54/19', np.NaN, '89/33']})
..and we want to divide that column into two different columns, but we know pandas will mess this up when we put it back into a DataFrame, so let's put it in a list:
split_list = list(df.TEXT_COLUMN.str.split('/'))
The split_list
returns, and we can see why we get a float error when attempting to parse:
>> [['12','4'],['54','19'], np.NaN, ['89','33']]
Now that we have that list, we want to then place it in a comprehension that corrects for the null value issue. We can do so by creating a conditional on type within the comprehension:
better_split_list = [x if type(x) != np.float else [None,None] for x in split_list]
The better_split_list
returns:
>> [['12','4'],['54','19'], [None,None], ['89','33']]
This puts us in a good place to then place the lists of lists into a its own pandas DataFrame with the columns being separated in a more robust way:
pd.DataFrame(better_split_list, columns = ['VALUE_1','VALUE_2'])
Upvotes: 1
Reputation: 862481
Problem is some None
or NaN
values obviously, but if use str.split
with parameter expand=True
for new DataFrame
it handling correctly.
Also instead replace
is possible use str.strip
:
df = pd.DataFrame({"ID": [2,3,4,5,6,7,8,9,10],
"type" :["A", "B", "B", "A", "A", "B", "A", "A", "A"],
"F_ID" :[None, "[7 8 9]", "[10]", np.nan, "[2]", "0", "0", "0", "0"]})
print (df)
ID type F_ID
0 2 A None
1 3 B [7 8 9]
2 4 B [10]
3 5 A NaN
4 6 A [2]
5 7 B 0
6 8 A 0
7 9 A 0
8 10 A 0
f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True)
print (f_id_df)
0 1 2
0 None None None
1 7 8 9
2 10 None None
3 NaN NaN NaN
4 2 None None
5 0 None None
6 0 None None
7 0 None None
8 0 None None
Last if need convert values to numeric:
f_id_df = df["F_ID"].str.strip("[]").str.split(expand=True).astype(float)
print (f_id_df)
0 1 2
0 NaN NaN NaN
1 7.0 8.0 9.0
2 10.0 NaN NaN
3 NaN NaN NaN
4 2.0 NaN NaN
5 0.0 NaN NaN
6 0.0 NaN NaN
7 0.0 NaN NaN
8 0.0 NaN NaN
Upvotes: 1