Reputation: 137
I have 25 data sets each is structured the same. Each contains many rows and 7 columns. Column 6 contains data that should be numerical but is not numerical. They are not numerical because the numbers contain commas i.e. 100000 is 100,000.
I can manually resolve this in each data set by removing the comma and then specifying that the data is numerical using the following code
df$column_6 <- gsub("[,]" , "", df$column_6)
df$column_6 <- as.numerical(df$column_6)
However as there are 25 data sets I would like to loop through them doing this however I am unable to do this.
Additionally because column 6 has a different name in each data set I would prefer to specify column 6 without using its name like below
df[6] <- gsub("[,]" , "", df[6])
however this doesn't seem to work.
My code is as follows
list_of_dfs = c(df1, df2, ..... , df25)
for (i in list_of_dfs) {
i[6] <- gsub("[,]" , "", i[6])
i[6] <- as.numerical(i[6])
}
Does anyone have any advice on how to do this
Upvotes: 0
Views: 254
Reputation: 591
The data table way
test<-data.table(col1=c('100,00','100','100,000'),col2=c('90','80,00','60'))
col1 col2
100,00 90
100 80,00
100,000 60
your list of data frames
testList<-list(test,test)
assume u want to correct col2 in this case but want to use index as reference
removeNonnumeric<-function(x){return(as.numeric(gsub(',','',x)))}
data<-function(x){return(x[,lapply(.SD,removeNonnumeric),.SDcols=names(x)[2],by=col1])}
removeNonnumeirc removes the "," from the columns and data accesses each data table in the testList and calls "removeNonnumeric" on them output is a list of data tables which is created by merging these 2 functions in an "lapply"
lapply(testList,data)
Upvotes: 0
Reputation: 145765
Your code is close, but has a few problems:
as.numerical
is a typo, it needs to be as.numeric
i[6]
doesn't work because you need to specify that it's the 6th column you want: i[, 6]
. See here for details on [
vs [[
.c(df1, df2)
doesn't actually create a list of data framesTry this instead:
## this is bad, it will make a single list of columns, not of data frames
# list_of_dfs = c(df1, df2, ..... , df25)
# use this instead
list_of_dfs = list(df1, df2, ..... , df25)
# or this
list_of_dfs = mget(ls(pattern = "df"))
for (i in seq_along(list_of_dfs)) {
list_of_dfs[[i]][, 6] <- as.numeric(gsub("[,]" , "", list_of_dfs[[i]][, 6]))
}
We can do a bit better, gsub
uses pattern-matching regular expressions by default, using the fixed = TRUE
argument instead will be quite a bit faster:
for (i in seq_along(list_of_dfs)) {
list_of_dfs[[i]][, 6] <- as.numeric(gsub(",", "", list_of_dfs[[i]][, 6], fixed = TRUE))
}
And we could use lapply
instead of a for
loop for slightly shorter code:
list_of_dfs[[i]] <- lapply(list_of_dfs, function(x) {
x[, 6] = as.numeric(gsub("," , "", x[, 6], fixed = TRUE))
return(x)
})
Upvotes: 2
Reputation: 8110
Try this out. You put all dataframes in a list, then you make the column numeric. Instead of gsub
I use readr::parse_number
. I'll also include a practice set for illustration.
library(tidyverse)
df1 <- data_frame(id = rep(1,3), num = c("10,000", "11,000", "12,000"))
df2 <- data_frame(id = rep(2,3), num = c("13,000", "14,000", "15,000"))
df3 <- data_frame(id = rep(3,3), num = c("16,000", "17,000", "18,000"))
list(df1, df2, df3) %>% map(~mutate(.x, num = parse_number(num)))
#> [[1]]
#> # A tibble: 3 x 2
#> id num
#> <dbl> <dbl>
#> 1 1 10000
#> 2 1 11000
#> 3 1 12000
#>
#> [[2]]
#> # A tibble: 3 x 2
#> id num
#> <dbl> <dbl>
#> 1 2 13000
#> 2 2 14000
#> 3 2 15000
#>
#> [[3]]
#> # A tibble: 3 x 2
#> id num
#> <dbl> <dbl>
#> 1 3 16000
#> 2 3 17000
#> 3 3 18000
Created on 2018-09-20 by the reprex package (v0.2.0).
Upvotes: 1
Reputation: 1376
Part of the answer has been sourced from here: Looping through list of data frames in R
In your case, you can do the following:
list_of_dfs = list(df1, df2, ..... , df25)
lapply(list_of_dfs, function(x) { x[, 6] <- as.integer(gsub("," , "", x[, 6])) })
Upvotes: 0