Reputation: 189
My data contain columns with empty rows that are read by pandas as nan
.
I want to create a dictionary of list from this data. However, some list contains nan
and I want to remove it.
If I use dropna()
in data.dropna().to_dict(orient='list')
, this will remove all the rows that contains at least one nan
, thefore I lose data.
Col1 Col2 Col3
a x r
b y v
c x
z
data = pd.read_csv(sys.argv[2], sep = ',')
dict = data.to_dict(orient='list')
Current output:
dict = {Col1: ['a','b','c',nan], Col2: ['x', 'y',nan,nan], Col3: ['r', 'v', 'x', 'z']}
Desire Output:
dict = {Col1: ['a','b','c'], Col2: ['x', 'y'], Col3: ['r', 'v', 'x', 'z']}
My goal: get the dictionary of a list, with nan
remove from the list.
Upvotes: 2
Views: 1995
Reputation: 2980
Not sure exactly the format you're expecting, but you can use list comprehension and itertuples to do this.
First create some data.
import pandas as pd
import numpy as np
data = pd.DataFrame.from_dict({'Col1': (1, 2, 3), 'Col2': (4, 5, 6), 'Col3': (7, 8, np.nan)})
print(data)
Giving a data frame of:
Col1 Col2 Col3
0 1 4 7.0
1 2 5 8.0
2 3 6 NaN
And then we create the dictionary using the iterator.
dict_1 = {x[0]: [y for y in x[1:] if not pd.isna(y)] for x in data.itertuples(index=True) }
print(dict_1)
>>>{0: [1, 4, 7.0], 1: [2, 5, 8.0], 2: [3, 6]}
To do the same for the columns is even easier:
dict_2 = {data[column].name: [y for y in data[column] if not pd.isna(y)] for column in data}
print(dict_2)
>>>{'Col1': [1, 2, 3], 'Col2': [4, 5, 6], 'Col3': [7.0, 8.0]}
Upvotes: 3
Reputation: 2090
I am not sure if I understand your question correctly, but if I do and what you want is to replace the nan
with a value so as not to lose your data then what you are looking for is pandas.DataFrame.fillna function. You mentioned the original value is an empty row, so filling the nan
with data.fillna('')
which fills it with empty string.
EDIT: After providing the desired output, the answer to your question changes a bit. What you'll need to do is to use dict comprehension with list comprehension to build said dictionary, looping by column and filtering nan
. I see that Andrew already provided the code to do this in his answer so have a look there.
Upvotes: 1