gusa10
gusa10

Reputation: 189

Remove 'nan' from Dictionary of list

My data contain columns with empty rows that are read by pandas as nan. I want to create a dictionary of list from this data. However, some list contains nan and I want to remove it.

If I use dropna() in data.dropna().to_dict(orient='list'), this will remove all the rows that contains at least one nan, thefore I lose data.

Col1 Col2  Col3
a     x     r
b     y     v
c           x
            z



data = pd.read_csv(sys.argv[2], sep = ',')
dict = data.to_dict(orient='list')

Current output:
dict = {Col1: ['a','b','c',nan], Col2: ['x', 'y',nan,nan], Col3: ['r', 'v', 'x', 'z']}

Desire Output:
dict = {Col1: ['a','b','c'], Col2: ['x', 'y'], Col3: ['r', 'v', 'x', 'z']}

My goal: get the dictionary of a list, with nan remove from the list.

Upvotes: 2

Views: 1995

Answers (2)

Andrew McDowell
Andrew McDowell

Reputation: 2980

Not sure exactly the format you're expecting, but you can use list comprehension and itertuples to do this.

First create some data.

import pandas as pd
import numpy as np

data = pd.DataFrame.from_dict({'Col1': (1, 2, 3), 'Col2': (4, 5, 6), 'Col3': (7, 8, np.nan)})
print(data)

Giving a data frame of:

   Col1  Col2  Col3
0     1     4   7.0
1     2     5   8.0
2     3     6   NaN

And then we create the dictionary using the iterator.

dict_1 = {x[0]: [y for y in x[1:] if not pd.isna(y)] for x in data.itertuples(index=True) }

print(dict_1)
>>>{0: [1, 4, 7.0], 1: [2, 5, 8.0], 2: [3, 6]}

To do the same for the columns is even easier:

dict_2 = {data[column].name: [y for y in data[column] if not pd.isna(y)] for column in data}

print(dict_2)
>>>{'Col1': [1, 2, 3], 'Col2': [4, 5, 6], 'Col3': [7.0, 8.0]}

Upvotes: 3

Alexander Rossa
Alexander Rossa

Reputation: 2090

I am not sure if I understand your question correctly, but if I do and what you want is to replace the nan with a value so as not to lose your data then what you are looking for is pandas.DataFrame.fillna function. You mentioned the original value is an empty row, so filling the nan with data.fillna('') which fills it with empty string.

EDIT: After providing the desired output, the answer to your question changes a bit. What you'll need to do is to use dict comprehension with list comprehension to build said dictionary, looping by column and filtering nan. I see that Andrew already provided the code to do this in his answer so have a look there.

Upvotes: 1

Related Questions