Thiago
Thiago

Reputation: 672

Split a character column from a dataframe based on specific token

I have a dataframe df and the first column looks like this:

[1] "760–563" "01455–1" "4672–04" "11–31234" "22–12" "11111–53" "111–21" "17–356239" "14–22352" "531–353"

I want to split that column on -.

What I'm doing is

strsplit(df[,1], "-")

The problem is that it's not working. It returns me a list without splitting the elements. I already tried adding the parameter fixed = TRUE and putting a regular expressing on the split parameter but nothing worked.

What is weird is that if I replicate the column on my own, for example:

myVector <- c("760–563" "01455–1" "4672–04" "11–31234" "22–12" "11111–53" "111–21" "17–356239" "14–22352" "531–353")

and then apply the strsplit, it works.

I already checked my column type and class with

class(df[,1]) and typeof(df[,1]) and both returns me character, so it's good.

I was also using the dataframe with dplyr so it was of the type tbl_df. I converted it back to dataframe but didn't work too.

Also tried apply(df, 2, function(x) strsplit(x, "-", fixed = T)) but didn't work too.

Any clues?

Upvotes: 2

Views: 1756

Answers (2)

thelatemail
thelatemail

Reputation: 93813

I don't know how you did it, but you have two different types of dashes:

charToRaw(substr("760–563", 4, 4))
#[1] 96
charToRaw("-")
#[1] 2d

So the strsplit() is working just fine, it's just that the dash isn't there in your original data. Adjust this, and away you go:

strsplit("760–563", "–")
#[[1]]
#[1] "760" "563"

Upvotes: 5

bramtayl
bramtayl

Reputation: 4024

You can just split on a non-numeric character

library(dplyr)
library(tidyr)

data %>%
  separate(your_column, 
           c("first_number", "second_number"),
           sep = "[^0-9]")

Upvotes: 2

Related Questions