cardamom
cardamom

Reputation: 7421

Replace all None's in a Pandas data frame with a tuple of None's

So I am working on some code for an NLP application. An interesting solution on Stackoverflow creates a dataframe from lists of unequal lengths. Taking the code from that solution with tuples in the input:

import pandas as pd
import itertools

aa = [('aa1',4), ('aa2',3), ('aa3',2), ('aa4',2), ('aa5',1)]
bb = [('bb1',8), ('bb2',6), ('bb3',4), ('bb4',4)]
cc = [('cc1',3), ('cc2',2), ('cc3',1)]
nest = [aa, bb, cc]

df = pd.DataFrame((_ for _ in itertools.zip_longest(*nest)), columns=['aa', 'bb', 'cc'])
df 

you get a dataframe which looks like this:

enter image description here

A subsequent step requires all elements in the data frame to be tuples.

I have tried this:

df.replace({None : (None,None)})

While it seems to run without error, it does not carry out any replacement. Any ideas how to accomplish this?

Upvotes: 2

Views: 1083

Answers (2)

Stefan Falk
Stefan Falk

Reputation: 25457

One way to do it would be using pandas.DataFrame.apply() and pandas.Series.map() like this:

df.apply(lambda ds: ds.map(lambda x: x if x != None else (None, None)))

Upvotes: 2

linpingta
linpingta

Reputation: 2620

It seems this could work, while I don't know why.

df = df.where(df!=[None], '(None, None)')

It may be better to use "np.nan" than "None" in DataFrame, as fillna could be used.

Upvotes: 2

Related Questions