Maximilian
Maximilian

Reputation: 4229

Select string starting with numbers in R

This is very simple yet I cannot get it 100% right!

I have column of data that looks like this:

"424343 Amsterdam center" 
"343423 London 42 ......"
"3434   Prague ........." 
"343345 Bratislava ...."
"! last entry ..... 25.08.2014..."
"Berlin"
...
...

I would like to replace all rows starting with letter with empty string ""

I have tried:

dataframe$column[grepl("(^[A-Z]+).*",dataframe$column)] <- ""

I'm still getting the rows like these .... "! last entry ..... 25.08.2014..."

Desired output:

 "424343 Amsterdam center" 
 "343423 London 42 ......"
 "3434   Prague ........." 
 "343345 Bratislava ...."
 ""
 ""
...
...

Upvotes: 3

Views: 6804

Answers (3)

IRTFM
IRTFM

Reputation: 263352

This was my strategy for building what I suspect was causing problems for the OP:

> inp <- scan(what="")
1: "424343 Amsterdam center" 
2: "343423 London 42 ......"
3: "3434   Prague ........." 
4: "343345 Bratislava ...."
5: "! last entry ..... 25.08.2014..."
6: "Berlin"
7: 

dat <- data.frame(inp=inp)

And what I suspect he was seeing:

> dat$inp[grepl("(^[A-Z]+).*",dat$inp)] <- ""
Warning message:
In `[<-.factor`(`*tmp*`, grepl("(^[A-Z]+).*", dat$inp), value = c(5L,  :
  invalid factor level, NA generated
> dat
                               inp
1          424343 Amsterdam center
2          343423 London 42 ......
3          3434   Prague .........
4           343345 Bratislava ....
5 ! last entry ..... 25.08.2014...
6                             <NA>

So the approach I was suggesting in my comment was one of two possibilities:

dat <- data.frame(inp=inp, stringsAsFactors=FALSE)  # option 1
dat$inp <- as.character(dat$inp)         # convert factor to character

Then the OP's code works as he expected:

> dat$inp[grepl("(^[A-Z]+).*",dat$inp)] <- ""
> dat
                               inp
1          424343 Amsterdam center
2          343423 London 42 ......
3          3434   Prague .........
4           343345 Bratislava ....
5 ! last entry ..... 25.08.2014...
6                                 

Upvotes: 3

Robert Krzyzanowski
Robert Krzyzanowski

Reputation: 9344

Something like this?

dataframe$column[grepl("^[^0-9]",dataframe$column)] <- ""

Upvotes: 2

sgibb
sgibb

Reputation: 25736

You can look just for strings starting with at least one number and take all non matching results (using !), e.g.:

!grepl("^[[:digit:]]+", text)

In your example:

dataframe$column[!grepl("^[[:digit:]]+",dataframe$column)] <- ""

Upvotes: 9

Related Questions