Reputation: 641
EDIT: jezrael had the right answer for the question I asked below. Unfortunately for me, I asked the wrong question. As it turns out, the problem was that the lists of strings in the DataFrame column contained None
elements, which is where the error was coming from. Please see the answer I have added for the code I used to fix this.
SECOND EDIT: jezrael has updated his answer to a way of doing what I did but more succinctly in a lambda expression.
I have a DataFrame, of which I select a column, upon which I call apply
, to which I provide the parameter of a lambda expression, which is an if
statement. I understand that at this point the column is treated as a Series.
The column is made of up strings and lists of strings, the latter of which I wish to convert to just plain strings by concatenating their elements and replacing that list with the resulting string, so that the FataFrame column is just strings.
Relevant code:
raw_data.address = raw_data.address.fillna('')
At this point I have looped through the entire address column and added all the types to a set - the only elements in that set are str
and list
.
raw_data.address.apply(lambda x: x if type(x) == str else ' '.join(x))
and
raw_data.address.apply(lambda x: x if isinstance(x, str) else ' '.join(x))
do not work.
This is the error message in both cases:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-32-5e2dce775d20> in <module>
----> 1 raw_data.address.apply(lambda x: x if type(x) == str else ' '.join(x))
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
3589 else:
3590 values = self.astype(object).values
-> 3591 mapped = lib.map_infer(values, f, convert=convert_dtype)
3592
3593 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-32-5e2dce775d20> in <lambda>(x)
----> 1 raw_data.address.apply(lambda x: x if type(x) == str else ' '.join(x))
TypeError: sequence item 0: expected str instance, NoneType found
I don't understand why this doesn't work. My understanding is that the syntax is correct.
Upvotes: 1
Views: 2356
Reputation: 641
As it turns out, the problem was that the lists in the DataFrame contained None
elements themselves. To solve this, instead of using a lambda function in apply, I just wrote a normal function, that uses the inbuilt function filter
to remove the None
s in the lists:
def make_strings(thing):
if isinstance(thing, list):
return ' '.join(filter(None, thing))
else:
return str(thing)
Upvotes: 0
Reputation: 862771
Compare list and remove None
values:
raw_data = pd.DataFrame({'address':[['a', 'b', None], 'c']})
print (raw_data)
address
0 [a, b, None]
1 c
raw_data.address = (raw_data.address
.apply(lambda x: ' '.join(filter(None, x))
if isinstance(x, list)
else x))
print (raw_data)
address
0 a b
1 c
Upvotes: 1