How to split a string at uppercase letters from a dataframe column?

Question

I want to separate all characters that start with uppercase characters in this dataframe column.

Unicainstancia_DF['TesteNomeJuiz']
0          ClinicadeOlhosSaoPauloLtda-Me
1        PatriciaAparecidaMendesFerreira
2        CarraroHoldingParticipaçõesLtda
3               IsadoraCentofantiFonseca
4       Petruso&PetrusoSupermercadosLtda
....
Name: TesteNomeJuiz, Length: 1510, dtype: object

And i already used a function that allows me to do that it seems not to work

def camel_case_split(identifier):
matches = finditer('.+?(?:(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)', identifier)
return [m.group(0) for m in matches]

Unicainstancia_DF['TesteNomeJuiz'].astype('str')
splitted = re.sub('([A-Z][a-z]+)', r' \1', re.sub('([A-Z]+)', r' \1', Unicainstancia_DF['TesteNomeJuiz'])).split

TypeError                                 
Traceback (most recent call last)
 in 
1 Unicainstancia_DF['TesteNomeJuiz'].astype('str')
--> 2 splitted = re.sub('([A-Z][a-z]+)', r' \1', re.sub('([A-Z]+)', r' \1', 
Unicainstancia_DF['TesteNomeJuiz'])).split
F:\Anaconda\lib\re.py in sub(pattern, repl, string, count, flags)
208     a callable, it's passed the Match object and must return
209     a replacement string to be used."""
--> 210     return _compile(pattern, flags).sub(repl, string, count)
211 
212 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object

And i also tried to call The info() function but doesn't work

Unicainstancia_DF['TesteNomeJuiz'].info()
AttributeError                            Traceback (most recent call last)
 in 
--> 1 Unicainstancia_DF['TesteNomeJuiz'].info()
F:\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
5273                 return self[name]
->5274             return object.__getattribute__(self, name)
5275 
5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'info'

kerasbaz · Accepted Answer

You can only call .info() on a pandas.DataFrame, not on a pandas.Series.

Assuming Unicainstancia_DF is a DataFrame, you could call: Unicainstancia_DF.info(), but not Unicainstancia_DF['TesteNomeJuiz'].info()

You're using a series/column selector when you use Unicainstancia_DF['TesteNomeJuiz'] -- you've selected a column (or 'Series') from a DataFrame and are about to do something with it.

What, precisely, you want to do with that Series isn't clear to me from your example. If you want to split on A-Z, then you could do something like this:

import re

print([re.split(r'[A-Z]', x) for x in Unicainstancia_DF['TesteNomeJuiz']]

But as Chris suggests, if you clarify your expected output and where you're wanting to store the splits I can be more specific. It seems doubtful that you actually want to split on A-Z -- more likely is that you want to split on the boundary between A-Z and any other character. Is that the case?

How to split a string at uppercase letters from a dataframe column?

Answers (1)

Related Questions