Kon Ath
Kon Ath

Reputation: 183

For loop for converting character data to numeric in a data frame

I have 50 data frames (different name each) with 10 (same name) columns of climate data. The first 5 columns although they are numbers, their class is "character". The rest 4 columns are already in the correct class (numeric) and the last one (named 'wind dir') is in character class so no change is needed.

I tried two ways to convert the class of those 5 columns in all 50 data frames, but nothing worked.

1st way) Firstly I've created a vector with the names of those 50 data frames and I named it onomata.

Secondly I've created a vector col_numbers2 <- c(1:5) with the number of columns I would like to convert.

Then I wrote the following code:

for(i in onomata){
  i[col_numbers2] <- sapply(i[col_numbers2], as.numeric)
}

Checking the class of those first five columns I saw that nothing changed. (No error report after executing the code)

2nd way) Then I tried to use the dplyr package with a for loop and the code is as follows:

for(i in onomata){
 i <- i %>%
  mutate_at(vars(-`wind_dir`),as.numeric)

In this case, I excluded the character column, and I applied the mutate function to the whole data frame, but I received an error message :

Error in UseMethod("tbl_vars") : no applicable method for 'tbl_vars' applied to an object of class "character"

What do you think I am doing wrong ?

Thank you

Original data table (what I get when I use read.table() for each txt file:

date Time Tdry Humidity Wind_velocity Wind_direction Wind_gust
02/01/15 02:00 2.4 77.0 6.4 WNW 20.9
02/01/15 03:00 2.3 77.0 11.3 NW 30.6
02/01/15 04:00 2.3 77.0 9.7 NW 20.9
02/01/15 05:00 2.3 77.0 11.3 NW 30.6
02/01/15 06:00 2.3 78.0 9.7 NW 19.3
02/01/15 07:00 2.2 79.0 12.9 NNW 35.4
02/01/15 08:00 2.4 79.0 8.0 NW 14.5
02/01/15 09:00 2.6 79.0 8.0 WNW 20.9

Data after I split data in columns 1 and 2 (date, time):

day month year Hour Minutes Tdry Humidity Wind_velocity Wind_direction Wind_gust
02 01 15 02 00 2.4 77.0 6.4 WNW 20.9
02 01 15 03 00 2.3 77.0 11.3 NW 30.6
02 01 15 04 00 2.3 77.0 9.7 NW 20.9
02 01 15 05 00 2.3 77.0 11.3 NW 30.6
02 01 15 06 00 2.3 78.0 9.7 NW 19.3
02 01 15 07 00 2.2 79.0 12.9 NNW 35.4
02 01 15 08 00 2.4 79.0 8.0 NW 14.5
02 01 15 09 00 2.6 79.0 8.0 WNW 20.9

Upvotes: 0

Views: 774

Answers (2)

Paul
Paul

Reputation: 2977

Here are two possible ways. Both relies on getting all your files in a list of dataframes (called df_list in the example below). To acheive this you could use mget() (ex: mget(onomata) or list.files()).

Once this is done, you can use lapply (or mapply) to go through all your dataframes.

Solution 1

To transform your data, I propose you 1st convert it into POSIXct format and then extract the relevant elements to make the wanted columns.

# create a custom function that transforms each dataframe the way you want
fun_split_datehour <- function(df){
  
  df[, "datetime"] <- as.POSIXct(paste(df$date, df$hour), format = "%d/%m/%Y %H:%M") # create a POSIXct column with info on date and time
  
  # Extract elements you need from the date & time column and store them in new columns
  df[,"year"] <- as.numeric(format(df[, "datetime"], format = "%Y"))
  df[,"month"] <- as.numeric(format(df[, "datetime"], format = "%m"))
  df[,"day"] <- as.numeric(format(df[, "datetime"], format = "%d"))
  df[,"hour"] <- as.numeric(format(df[, "datetime"], format = "%H"))
  df[,"min"] <- as.numeric(format(df[, "datetime"], format = "%M"))
  
  return(df)
}

# use this function on each dataframe of your list
lapply(df_list, FUN = fun_split_datehour)

Adapted from Split date data (m/d/y) into 3 separate columns (this answer)

Data:

# two dummy dataframe, date and hour format does not matter, you can tell as.POSIXct what to expect using format argument (see ?as.POSIXct)
df1 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                 hour = c("05:32", "08:20", "15:33"))
df2 <- data.frame(date = c("02/01/2010", "03/02/2010", "10/09/2010"),
                  hour = c("05:32", "08:20", "15:33"))
# you can replace c("df1", "df2") with onomata:  df_list <- mget(onomata)
df_list <- mget(c("df1", "df2"))

Outputs:

> lapply(df_list, FUN = fun_split_datehour)
$df1
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

$df2
        date hour            datetime year month day min
1 2010-01-02    5 2010-01-02 05:32:00 2010     1   2  32
2 2010-02-03    8 2010-02-03 08:20:00 2010     2   3  20
3 2010-09-10   15 2010-09-10 15:33:00 2010     9  10  33

And columns year, month, day, hour and min are numeric. You can check using str(lapply(df_list, FUN = fun_split_datehour)).

Note: looking at the question you asked before this one, you might find https://stackoverflow.com/a/24376207/10264278 usefull. In addition, using POSIXct format will save you time if you want to make plots, arrange, etc.


Solution 2

If you do not want to use POSIXct, you could do:

# Dummy data changed to match you situation with already splited date
dfa <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
dfb <- data.frame(day = c("02", "03", "10"),
                  hour = c("05", "08", "15"))
df_list <- mget(c("dfa", "dfb"))

# Same thing, use lapply() to go through each dataframe of the list and apply() to use as.numeric on the wanted columns
lapply(df_list, FUN = function(df){as.data.frame(apply(df[1:2], 2, as.numeric))}) # change df[1:2] to select columns you want to convert in your actual dataframes

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

Maybe the following code can help.
First, get the filenames with list.files. Second, read them all in with lapply. If read.table is not the appropriate function, read help("read.table"), it is the same page as for read.csv, read.csv2, etc. Then, coerce the first 5 columns of all data.frames to numeric in one go.

filenames <- list.files(path = "your_directory", pattern = "\\.txt")
onomata <- lapply(filenames, read.table)

onomata <- lapply(onomata, function(X){
  X[1:5] <- lapply(X[1:5], as.numeric)
  X
})

Upvotes: 1

Related Questions