Reputation: 358
I've the following dataframe
Group | Cost | Year |
---|---|---|
A | 21 | 2017 |
A | 22 | 2016 |
A | closed | 2015 |
B | 12 | 2017 |
B | 11 | 2016 |
C | ended | 2017 |
C | closing | 2016 |
C | 13 | 2015 |
Group = c("A", "A", "A", "B", "B", "C", "C", "C")
Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
df = data.frame(Group, Cost, Year)
Anybody knows how I can filter the dataframe (df) based on keeping over the numeric values in the Cost column? The outcome should be the following table
Group | Cost | Year |
---|---|---|
A | 21 | 2017 |
A | 22 | 2016 |
B | 12 | 2017 |
B | 11 | 2016 |
C | 13 | 2015 |
Thanks in advance!
Upvotes: 0
Views: 1255
Reputation: 2977
You could use a regular expression to filter the relevant rows of your dataframe.
The regular expression ^\\d+(\\.\\d+)?$
will check for character that contains only digits, possibly with .
as a decimal separator (i.e. 2, 2.3). You could then convert the Cost
column to numeric using as.numeric()
if needed.
See the example below:
Group = c("A", "A", "A", "B", "B", "C", "C", "C")
Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
df = data.frame(Group, Cost, Year)
df[grep(pattern = "^\\d+(\\.\\d+)?$", df[,"Cost"]), ]
#> Group Cost Year
#> 1 A 21 2017
#> 2 A 22 2016
#> 4 B 12 2017
#> 5 B 11 2016
#> 8 C 13 2015
Note that this technique works even if your Cost
column is of factor
class while using df[!is.na(as.numeric(df$Cost)), ]
does not. For the latter you need to add as.character()
first: df[!is.na(as.numeric(as.character(df$Cost))), ]
. Both techniques keep factor levels.
Upvotes: 1