For loop for converting character data to numeric in a data frame

Question

I have 50 data frames (different name each) with 10 (same name) columns of climate data. The first 5 columns although they are numbers, their class is "character". The rest 4 columns are already in the correct class (numeric) and the last one (named 'wind dir') is in character class so no change is needed.

I tried two ways to convert the class of those 5 columns in all 50 data frames, but nothing worked.

1st way) Firstly I've created a vector with the names of those 50 data frames and I named it onomata.

Secondly I've created a vector col_numbers2 <- c(1:5) with the number of columns I would like to convert.

Then I wrote the following code:

for(i in onomata){
  i[col_numbers2] <- sapply(i[col_numbers2], as.numeric)
}

Checking the class of those first five columns I saw that nothing changed. (No error report after executing the code)

2nd way) Then I tried to use the dplyr package with a for loop and the code is as follows:

for(i in onomata){
 i <- i %>%
  mutate_at(vars(-`wind_dir`),as.numeric)

In this case, I excluded the character column, and I applied the mutate function to the whole data frame, but I received an error message :

Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character"

What do you think I am doing wrong ?

Thank you

Original data table (what I get when I use read.table() for each txt file:

date	Time	Tdry	Humidity	Wind_velocity	Wind_direction	Wind_gust
02/01/15	02:00	2.4	77.0	6.4	WNW	20.9
02/01/15	03:00	2.3	77.0	11.3	NW	30.6
02/01/15	04:00	2.3	77.0	9.7	NW	20.9
02/01/15	05:00	2.3	77.0	11.3	NW	30.6
02/01/15	06:00	2.3	78.0	9.7	NW	19.3
02/01/15	07:00	2.2	79.0	12.9	NNW	35.4
02/01/15	08:00	2.4	79.0	8.0	NW	14.5
02/01/15	09:00	2.6	79.0	8.0	WNW	20.9

Data after I split data in columns 1 and 2 (date, time):

day	month	year	Hour	Tdry	Humidity	Wind_velocity	Wind_direction	Wind_gust
02	01	15	02	2.4	77.0	6.4	WNW	20.9
02	01	15	03	2.3	77.0	11.3	NW	30.6
02	01	15	04	2.3	77.0	9.7	NW	20.9
02	01	15	05	2.3	77.0	11.3	NW	30.6
02	01	15	06	2.3	78.0	9.7	NW	19.3
02	01	15	07	2.2	79.0	12.9	NNW	35.4
02	01	15	08	2.4	79.0	8.0	NW	14.5
02	01	15	09	2.6	79.0	8.0	WNW	20.9

Paul · Accepted Answer

Here are two possible ways. Both relies on getting all your files in a list of dataframes (called df_list in the example below). To acheive this you could use mget() (ex: mget(onomata) or list.files()).

Once this is done, you can use lapply (or mapply) to go through all your dataframes.

Solution 1

To transform your data, I propose you 1st convert it into POSIXct format and then extract the relevant elements to make the wanted columns.

# create a custom function that transforms each dataframe the way you want
fun_split_datehour <- function(df){
  
  df[, "datetime"] <- as.POSIXct(paste(df$date, df$hour), format = "%d/%m/%Y %H:%M") # create a POSIXct column with info on date and time
  
  # Extract elements you need from the date & time column and store them in new columns
  df[,"year"] <- as.numeric(format(df[, "datetime"], format = "%Y"))
  df[,"month"] <- as.numeric(format(df[, "datetime"], format = "%m"))
  df[,"day"] <- as.numeric(format(df[, "datetime"], format = "%d"))
  df[,"hour"] <- as.numeric(format(df[, "datetime"], format = "%H"))
  df[,"min"] <- as.numeric(format(df[, "datetime"], format = "%M"))
  
  return(df)
}

# use this function on each dataframe of your list
lapply(df_list, FUN = fun_split_datehour)

Adapted from Split date data (m/d/y) into 3 separate columns (this answer)

Data:

# two dummy dataframe, date and hour format does not matter, you can tell as.POSIXct what to expect using format argument (see ?as.POSIXct)
df1 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                 hour = c("05:32", "08:20", "15:33"))
df2 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                  hour = c("05:32", "08:20", "15:33"))
# you can replace c("df1", "df2") with onomata:  df_list <- mget(onomata)
df_list <- mget(c("df1", "df2"))

Outputs:

> lapply(df_list, FUN = fun_split_datehour)
$df1
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

$df2
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

And columns year, month, day, hour and min are numeric. You can check using str(lapply(df_list, FUN = fun_split_datehour)).

Note: looking at the question you asked before this one, you might find https://stackoverflow.com/a/24376207/10264278 usefull. In addition, using POSIXct format will save you time if you want to make plots, arrange, etc.

Solution 2

If you do not want to use POSIXct, you could do:

# Dummy data changed to match you situation with already splited date
dfa <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
dfb <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
df_list <- mget(c("dfa", "dfb"))

# Same thing, use lapply() to go through each dataframe of the list and apply() to use as.numeric on the wanted columns
lapply(df_list, FUN = function(df){as.data.frame(apply(df[1:2], 2, as.numeric))}) # change df[1:2] to select columns you want to convert in your actual dataframes

For loop for converting character data to numeric in a data frame

Answers (2)

Related Questions