Reputation: 183
I have 50 data frames (different name each) with 10 (same name) columns of climate data. The first 5 columns although they are numbers, their class is "character". The rest 4 columns are already in the correct class (numeric) and the last one (named 'wind dir') is in character class so no change is needed.
I tried two ways to convert the class of those 5 columns in all 50 data frames, but nothing worked.
1st way) Firstly I've created a vector with the names of those 50 data frames and I named it onomata.
Secondly I've created a vector col_numbers2 <- c(1:5)
with the number of columns I would like to convert.
Then I wrote the following code:
for(i in onomata){
i[col_numbers2] <- sapply(i[col_numbers2], as.numeric)
}
Checking the class of those first five columns I saw that nothing changed. (No error report after executing the code)
2nd way) Then I tried to use the dplyr package with a for loop and the code is as follows:
for(i in onomata){
i <- i %>%
mutate_at(vars(-`wind_dir`),as.numeric)
In this case, I excluded the character column, and I applied the mutate function to the whole data frame, but I received an error message :
Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character"
What do you think I am doing wrong ?
Thank you
Original data table (what I get when I use read.table() for each txt file:
date | Time | Tdry | Humidity | Wind_velocity | Wind_direction | Wind_gust |
---|---|---|---|---|---|---|
02/01/15 | 02:00 | 2.4 | 77.0 | 6.4 | WNW | 20.9 |
02/01/15 | 03:00 | 2.3 | 77.0 | 11.3 | NW | 30.6 |
02/01/15 | 04:00 | 2.3 | 77.0 | 9.7 | NW | 20.9 |
02/01/15 | 05:00 | 2.3 | 77.0 | 11.3 | NW | 30.6 |
02/01/15 | 06:00 | 2.3 | 78.0 | 9.7 | NW | 19.3 |
02/01/15 | 07:00 | 2.2 | 79.0 | 12.9 | NNW | 35.4 |
02/01/15 | 08:00 | 2.4 | 79.0 | 8.0 | NW | 14.5 |
02/01/15 | 09:00 | 2.6 | 79.0 | 8.0 | WNW | 20.9 |
Data after I split data in columns 1 and 2 (date, time):
day | month | year | Hour | Minutes | Tdry | Humidity | Wind_velocity | Wind_direction | Wind_gust |
---|---|---|---|---|---|---|---|---|---|
02 | 01 | 15 | 02 | 00 | 2.4 | 77.0 | 6.4 | WNW | 20.9 |
02 | 01 | 15 | 03 | 00 | 2.3 | 77.0 | 11.3 | NW | 30.6 |
02 | 01 | 15 | 04 | 00 | 2.3 | 77.0 | 9.7 | NW | 20.9 |
02 | 01 | 15 | 05 | 00 | 2.3 | 77.0 | 11.3 | NW | 30.6 |
02 | 01 | 15 | 06 | 00 | 2.3 | 78.0 | 9.7 | NW | 19.3 |
02 | 01 | 15 | 07 | 00 | 2.2 | 79.0 | 12.9 | NNW | 35.4 |
02 | 01 | 15 | 08 | 00 | 2.4 | 79.0 | 8.0 | NW | 14.5 |
02 | 01 | 15 | 09 | 00 | 2.6 | 79.0 | 8.0 | WNW | 20.9 |
Upvotes: 0
Views: 774
Reputation: 2977
Here are two possible ways. Both relies on getting all your files in a list of dataframes (called df_list
in the example below). To acheive this you could use mget()
(ex: mget(onomata)
or list.files()
).
Once this is done, you can use lapply
(or mapply
) to go through all your dataframes.
Solution 1
To transform your data, I propose you 1st convert it into POSIXct format and then extract the relevant elements to make the wanted columns.
# create a custom function that transforms each dataframe the way you want
fun_split_datehour <- function(df){
df[, "datetime"] <- as.POSIXct(paste(df$date, df$hour), format = "%d/%m/%Y %H:%M") # create a POSIXct column with info on date and time
# Extract elements you need from the date & time column and store them in new columns
df[,"year"] <- as.numeric(format(df[, "datetime"], format = "%Y"))
df[,"month"] <- as.numeric(format(df[, "datetime"], format = "%m"))
df[,"day"] <- as.numeric(format(df[, "datetime"], format = "%d"))
df[,"hour"] <- as.numeric(format(df[, "datetime"], format = "%H"))
df[,"min"] <- as.numeric(format(df[, "datetime"], format = "%M"))
return(df)
}
# use this function on each dataframe of your list
lapply(df_list, FUN = fun_split_datehour)
Adapted from Split date data (m/d/y) into 3 separate columns (this answer)
Data:
# two dummy dataframe, date and hour format does not matter, you can tell as.POSIXct what to expect using format argument (see ?as.POSIXct)
df1 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
hour = c("05:32", "08:20", "15:33"))
df2 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
hour = c("05:32", "08:20", "15:33"))
# you can replace c("df1", "df2") with onomata: df_list <- mget(onomata)
df_list <- mget(c("df1", "df2"))
Outputs:
> lapply(df_list, FUN = fun_split_datehour)
$df1
date hour datetime year month day min
1 2010-01-02 5 2010-01-02 05:32:00 2010 1 2 32
2 2010-02-03 8 2010-02-03 08:20:00 2010 2 3 20
3 2010-09-10 15 2010-09-10 15:33:00 2010 9 10 33
$df2
date hour datetime year month day min
1 2010-01-02 5 2010-01-02 05:32:00 2010 1 2 32
2 2010-02-03 8 2010-02-03 08:20:00 2010 2 3 20
3 2010-09-10 15 2010-09-10 15:33:00 2010 9 10 33
And columns year
, month
, day
, hour
and min
are numeric. You can check using str(lapply(df_list, FUN = fun_split_datehour))
.
Note: looking at the question you asked before this one, you might find https://stackoverflow.com/a/24376207/10264278 usefull. In addition, using POSIXct format will save you time if you want to make plots, arrange, etc.
Solution 2
If you do not want to use POSIXct, you could do:
# Dummy data changed to match you situation with already splited date
dfa <- data.frame(day = c("02", "03", "10"),
hour = c("05", "08", "15"))
dfb <- data.frame(day = c("02", "03", "10"),
hour = c("05", "08", "15"))
df_list <- mget(c("dfa", "dfb"))
# Same thing, use lapply() to go through each dataframe of the list and apply() to use as.numeric on the wanted columns
lapply(df_list, FUN = function(df){as.data.frame(apply(df[1:2], 2, as.numeric))}) # change df[1:2] to select columns you want to convert in your actual dataframes
Upvotes: 1
Reputation: 76402
Maybe the following code can help.
First, get the filenames with list.files
. Second, read them all in with lapply
. If read.table
is not the appropriate function, read help("read.table")
, it is the same page as for read.csv
, read.csv2
, etc. Then, coerce the first 5 columns of all data.frames to numeric in one go.
filenames <- list.files(path = "your_directory", pattern = "\\.txt")
onomata <- lapply(filenames, read.table)
onomata <- lapply(onomata, function(X){
X[1:5] <- lapply(X[1:5], as.numeric)
X
})
Upvotes: 1