Reputation: 41
I copy and pasted the weather information from the following website "weather underground" for some data analysis and the data looks like below:
https://www.wunderground.com/dashboard/pws/KCACHINO13/table/2018-04-10/2018-04-10/daily
As you can see, the temperature and other information all have the text with it so I cannot conduct any calculation. In the excel, I used substitute(xx,"F","") to remove the F from the "Temperature" column, but then I wanted to convert Farenheit to Celcius using convert(xx,"F","C"), I could not get the outcome. I think there is something wrong with the data itself. I formatted the cell into number or copy and paste the value to a new column, but neither of them worked.
Then I import the data.frame into R and try to do some data formating using R. I checked the class of the Temperature column, which shows "character":
class(a$Temperature)
#"character"
a$Temperature <- gsub("F","",a$Temperature)
# this command remmoved "F"
as.numeric(a$Temperature)
#Warning message: NAs introduced by coercion
as.numeric(unlist(a$Temperature))
#still the same warning message
From the excel, I created the new column removing "F" from temperature, and used this in R to convert "character" to "numeric", I still got the warning message. I don't know how to deal with this problem. Could someone help me with this? Thank you!
As recommended below, I am pasting the output from
dput(head(a))
#structure(list(Time = structure(c(-2209075140, -2209074840, -2209074540,
-2209074240, -2209073940, -2209073640), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Temperature = c("60.0 ", "59.9 ", "59.8 ", "59.7 ",
"59.6 ", "59.5 "), `T(F)` = c("60.0 ", "59.9 ", "59.8 ", "59.7 ",
"59.6 ", "59.5 "), `Dew Point` = c("48.2 F", "48.1 F", "48.4 F",
"48.3 F", "48.2 F", "48.1 F"), Humidity = c("65 %", "65 %", "66 %",
"66 %", "66 %", "66 %"), Wind = c("WSW", "WSW", "WSW", "WSW",
"WSW", "WSW"), Speed = c("0.0 mph", "0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph"), Gust = c("0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph"), Pressure = c("29.88 in", "29.88 in",
"29.88 in", "29.88 in", "29.88 in", "29.88 in"), `Precip. Rate.` = c("0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in", "0.00 in"), `Precip. Accum.` = c("0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in", "0.00 in"), UV = c(0,
0, 0, 0, 0, 0), Solar = c("0 w/m²", "0 w/m²", "0 w/m²", "0 w/m²",
"0 w/m²", "0 w/m²")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 0
Views: 959
Reputation: 11
Maybe this, helps. In this function there is nested few different functions, such as changing from character variable to numeric. Also gsub, which changes comma to empty space. You should just change comma to letter what you are changing. Never tried if it works with letters, but this might be one solution. Here is the code:
data666 <- apply(data, 2, function(x) as.numeric(as.character(gsub(",", "", x))))
Apply function apllies functions in for whole dataset. 2 means that it does it column by column. If you want to chaange it row by ro you have to change 2 to 1.
Upvotes: 0
Reputation: 1364
If you want to convert only Temperature column, here is an option you may consider.
Data
df <- structure(list(Time = c("12:04 AM", "12:09 AM", "12:14 AM", "12:19 AM",
"12:24 AM", "12:29 AM"), Temperature = c("69.4 F", "69.2 F",
"68.8 F", "68.5 F", "68.3 F", "68.0 F"), Dew.Point = c("45.9 F",
"46.0 F", "45.8 F", "45.7 F", "45.7 F", "45.7 F"), Humidity = c("43 %",
"43 %", "44 %", "44 %", "44 %", "45 %"), Wind = c("NE", "NE",
"NE", "NE", "NE", "NE"), Speed = c("0.0 mph", "0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph"), Gust = c("0.0 mph", "0.0 mph",
"0.0 mph", "0.0 mph", "0.0 mph", "0.0 mph"), Pressure = c("29.93 in",
"29.94 in", "29.94 in", "29.95 in", "29.95 in", "29.95 in"),
Precip..Rate. = c("0.00 in", "0.00 in", "0.00 in", "0.00 in",
"0.00 in", "0.00 in"), Precip..Accum. = c("0.00 in", "0.00 in",
"0.00 in", "0.00 in", "0.00 in", "0.00 in"), UV = c(0L, 0L,
0L, 0L, 0L, 0L), Solar = c("0 w/m²", "0 w/m²", "0 w/m²",
"0 w/m²", "0 w/m²", "0 w/m²")), class = "data.frame", row.names = c(NA,
-6L))
Code
library(dplyr)
library(stringr)
df2 <- df %>%
mutate(Temperature2 = as.numeric(str_extract(Temperature, "[\\d\\.]+"))) %>%
relocate(Temperature2, .after = Temperature)
df2[, 2:3]
# Temperature Temperature2
# 1 69.4 F 69.4
# 2 69.2 F 69.2
# 3 68.8 F 68.8
# 4 68.5 F 68.5
# 5 68.3 F 68.3
# 6 68.0 F 68.0
str(df2$Temperature2)
# num [1:6] 69.4 69.2 68.8 68.5 68.3 68
Upvotes: 2