Skye
Skye

Reputation: 57

How to divide a column containing int and string values separated by a semicolon

i have a column looking like this:

Column0
5,00; Today is a good day
3,00; Hello there
8,00; nan
2,00; What a great weather
4,00; nan
2,00; nan

dtypes: object

How do i divide this Column into two separate columns? I want it to look like this:

Column1    Column2
5,00       Today is a good day
3,00       Hello there
8,00     
2,00       What a great weather
4,00
2,00

I tried using:

df['Column1']=df.Column0.str.split(';').str[0]
df['Column2']=df.Column0.str.split(';').str[-1]

But that gives me back following:

Column1    Column2
5,00       Today is a good day
3,00       Hello there
8,00       8,00
2,00       What a great weather
4,00       4,00
2,00       2,00

Thanks for your help! :)

Hello everyone! I ended up using:

df['Column2']=df['Column0'].apply(lambda x: str(x).split(';')[1] if len(str(x).split(';'))>1 else "")
df['Column1']=df.Column0.str.split(';').str[0]
df = df.drop('Column0', axis=1)

The first line gives me the string values in the column with no Null or NaN The second line gives me the numbers

After that i dropped Column0 cause i dont need it anymore. Thanks so much everyone!

Column1    Column2
5,00       Today is a good day
3,00       Hello there
8,00     
2,00       What a great weather
4,00
2,00

Upvotes: 0

Views: 217

Answers (3)

exhuma
exhuma

Reputation: 21747

The reason for this is that some rows only contain one value. For example, the row

5,00; Today is a good day

Will split into

['5,00', 'Today is a good day']

and the line

8,00; nan

will split into

['8,00']

When you access the result using [-1] as index, you will ask it to give you the last value from the list. Which, if you look at the example above, give you sometimes the number, and sometimes the text, depending if the text was empty or not.

You need to use a default value in the case the second column is empty. To avoid getting a super gnarly line, I would extract this into a simple function:

def extractor(index):
    def extract(column):
        values = column.split(';')
        if index == 0:
            return values[0]
        if len(values) < 2:
            return ''  # The default value (could also be injected via the extractor function)
        return values[1]
    return extract

df['Column1'] = df.Column0.apply(extractor(0))
df['Column2'] = df.Column0.apply(extractor(1))

Alternative implementation

The following implementation is a bit more explicit and should be more maintainable. Instead of using split, it uses partition which fits this use-case better and allows writing code with less "index-access".

Additionally, instead of passing in the index into the extractor I decided to pass in string value (could be improved with an Enum, but that might be overkill here). The advantage of using a text is that the code becomes more expressive and you don't need to remember that "0" is the numerical value and "1" is the label.

Overall this should improve the readability (and as such maintainability) of the code with minimal changes.

def extractor(field):
    def extract(column):
        value, _, text = column.partition(';')
        if field == 'text':
            return text
        if field == 'value':
            return value
        raise ValueError('Unknown Field: %r' % field)
    return extract

df['Column1'] = df.Column0.apply(extractor('text'))
df['Column2'] = df.Column0.apply(extractor('value'))

Upvotes: 1

Frederik H&#248;jlund
Frederik H&#248;jlund

Reputation: 197

You can join everything after the first semicolon like this. I assume df.Column0.str is a string. If there isn't anything after the semicolon it will just be the empty string.

 df['Column1']=df.Column0.str.split(';')[0]
 df['Column2']="".join(df.Column0.str.split(';')[1:])

Upvotes: 0

David Kong
David Kong

Reputation: 638

try

df['Column2']=df['Column0'].apply(lambda x: str(x).split(';')[1] if len(str(x).split(';'))>1 else None)

Upvotes: 0

Related Questions