Reputation: 4229
This is very simple yet I cannot get it 100% right!
I have column of data that looks like this:
"424343 Amsterdam center"
"343423 London 42 ......"
"3434 Prague ........."
"343345 Bratislava ...."
"! last entry ..... 25.08.2014..."
"Berlin"
...
...
I would like to replace all rows starting with letter with empty string ""
I have tried:
dataframe$column[grepl("(^[A-Z]+).*",dataframe$column)] <- ""
I'm still getting the rows like these .... "! last entry ..... 25.08.2014..."
Desired output:
"424343 Amsterdam center"
"343423 London 42 ......"
"3434 Prague ........."
"343345 Bratislava ...."
""
""
...
...
Upvotes: 3
Views: 6804
Reputation: 263352
This was my strategy for building what I suspect was causing problems for the OP:
> inp <- scan(what="")
1: "424343 Amsterdam center"
2: "343423 London 42 ......"
3: "3434 Prague ........."
4: "343345 Bratislava ...."
5: "! last entry ..... 25.08.2014..."
6: "Berlin"
7:
dat <- data.frame(inp=inp)
And what I suspect he was seeing:
> dat$inp[grepl("(^[A-Z]+).*",dat$inp)] <- ""
Warning message:
In `[<-.factor`(`*tmp*`, grepl("(^[A-Z]+).*", dat$inp), value = c(5L, :
invalid factor level, NA generated
> dat
inp
1 424343 Amsterdam center
2 343423 London 42 ......
3 3434 Prague .........
4 343345 Bratislava ....
5 ! last entry ..... 25.08.2014...
6 <NA>
So the approach I was suggesting in my comment was one of two possibilities:
dat <- data.frame(inp=inp, stringsAsFactors=FALSE) # option 1
dat$inp <- as.character(dat$inp) # convert factor to character
Then the OP's code works as he expected:
> dat$inp[grepl("(^[A-Z]+).*",dat$inp)] <- ""
> dat
inp
1 424343 Amsterdam center
2 343423 London 42 ......
3 3434 Prague .........
4 343345 Bratislava ....
5 ! last entry ..... 25.08.2014...
6
Upvotes: 3
Reputation: 9344
Something like this?
dataframe$column[grepl("^[^0-9]",dataframe$column)] <- ""
Upvotes: 2
Reputation: 25736
You can look just for strings starting with at least one number and take all non matching results (using !
), e.g.:
!grepl("^[[:digit:]]+", text)
In your example:
dataframe$column[!grepl("^[[:digit:]]+",dataframe$column)] <- ""
Upvotes: 9