Reputation: 57
i have a column looking like this:
Column0
5,00; Today is a good day
3,00; Hello there
8,00; nan
2,00; What a great weather
4,00; nan
2,00; nan
dtypes: object
How do i divide this Column into two separate columns? I want it to look like this:
Column1 Column2
5,00 Today is a good day
3,00 Hello there
8,00
2,00 What a great weather
4,00
2,00
I tried using:
df['Column1']=df.Column0.str.split(';').str[0]
df['Column2']=df.Column0.str.split(';').str[-1]
But that gives me back following:
Column1 Column2
5,00 Today is a good day
3,00 Hello there
8,00 8,00
2,00 What a great weather
4,00 4,00
2,00 2,00
Hello everyone! I ended up using:
df['Column2']=df['Column0'].apply(lambda x: str(x).split(';')[1] if len(str(x).split(';'))>1 else "")
df['Column1']=df.Column0.str.split(';').str[0]
df = df.drop('Column0', axis=1)
The first line gives me the string values in the column with no Null or NaN The second line gives me the numbers
After that i dropped Column0 cause i dont need it anymore. Thanks so much everyone!
Column1 Column2
5,00 Today is a good day
3,00 Hello there
8,00
2,00 What a great weather
4,00
2,00
Upvotes: 0
Views: 217
Reputation: 21747
The reason for this is that some rows only contain one value. For example, the row
5,00; Today is a good day
Will split into
['5,00', 'Today is a good day']
and the line
8,00; nan
will split into
['8,00']
When you access the result using [-1]
as index, you will ask it to give you the last value from the list. Which, if you look at the example above, give you sometimes the number, and sometimes the text, depending if the text was empty or not.
You need to use a default value in the case the second column is empty. To avoid getting a super gnarly line, I would extract this into a simple function:
def extractor(index):
def extract(column):
values = column.split(';')
if index == 0:
return values[0]
if len(values) < 2:
return '' # The default value (could also be injected via the extractor function)
return values[1]
return extract
df['Column1'] = df.Column0.apply(extractor(0))
df['Column2'] = df.Column0.apply(extractor(1))
The following implementation is a bit more explicit and should be more maintainable. Instead of using split
, it uses partition
which fits this use-case better and allows writing code with less "index-access".
Additionally, instead of passing in the index into the extractor I decided to pass in string value (could be improved with an Enum
, but that might be overkill here). The advantage of using a text is that the code becomes more expressive and you don't need to remember that "0" is the numerical value and "1" is the label.
Overall this should improve the readability (and as such maintainability) of the code with minimal changes.
def extractor(field):
def extract(column):
value, _, text = column.partition(';')
if field == 'text':
return text
if field == 'value':
return value
raise ValueError('Unknown Field: %r' % field)
return extract
df['Column1'] = df.Column0.apply(extractor('text'))
df['Column2'] = df.Column0.apply(extractor('value'))
Upvotes: 1
Reputation: 197
You can join everything after the first semicolon like this. I assume df.Column0.str is a string. If there isn't anything after the semicolon it will just be the empty string.
df['Column1']=df.Column0.str.split(';')[0]
df['Column2']="".join(df.Column0.str.split(';')[1:])
Upvotes: 0
Reputation: 638
try
df['Column2']=df['Column0'].apply(lambda x: str(x).split(';')[1] if len(str(x).split(';'))>1 else None)
Upvotes: 0