Luther_Proton
Luther_Proton

Reputation: 358

Filtering Dataframe by keeping numeric values of a specific column only in R

I've the following dataframe

Group Cost Year
A 21 2017
A 22 2016
A closed 2015
B 12 2017
B 11 2016
C ended 2017
C closing 2016
C 13 2015
Group = c("A", "A", "A", "B", "B", "C", "C", "C")
Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
df = data.frame(Group, Cost, Year)

Anybody knows how I can filter the dataframe (df) based on keeping over the numeric values in the Cost column? The outcome should be the following table

Group Cost Year
A 21 2017
A 22 2016
B 12 2017
B 11 2016
C 13 2015

Thanks in advance!

Upvotes: 0

Views: 1255

Answers (1)

Paul
Paul

Reputation: 2977

You could use a regular expression to filter the relevant rows of your dataframe. The regular expression ^\\d+(\\.\\d+)?$ will check for character that contains only digits, possibly with . as a decimal separator (i.e. 2, 2.3). You could then convert the Cost column to numeric using as.numeric() if needed.

See the example below:

Group = c("A", "A", "A", "B", "B", "C", "C", "C")
Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
df = data.frame(Group, Cost, Year)


df[grep(pattern = "^\\d+(\\.\\d+)?$", df[,"Cost"]), ]
#>   Group Cost Year
#> 1     A   21 2017
#> 2     A   22 2016
#> 4     B   12 2017
#> 5     B   11 2016
#> 8     C   13 2015

Note that this technique works even if your Cost column is of factor class while using df[!is.na(as.numeric(df$Cost)), ] does not. For the latter you need to add as.character() first: df[!is.na(as.numeric(as.character(df$Cost))), ]. Both techniques keep factor levels.

Upvotes: 1

Related Questions