Reputation: 11
I'm supposed to combine all four datasets from https://www.drroyspencer.com/latest-global-temperatures/ into one dataframe. I've tried to merge it without luck. Any help?
What I've tried so far.
# Lower troposphere data.
lower_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " ")
# Mid-Troposphere data.
mid_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tmt/uahncdc_mt_6.0.txt", sep = " ")
# Tropopause data.
trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/ttp/uahncdc_tp_6.0.txt", sep = " ")
# Lower Stratosphere data.
lower_strat <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tls/uahncdc_ls_6.0.txt" , sep = " ")
#
lower_trop1 <- lower_trop[-c(518)]
mid_trop1 <- Mid_trop[-c(518)]
trop1 <- trop[-c(518)]
lower_strat1 <- lower_strat[-c(518)]
#
lower_trop <- as.data.frame(apply(lower_trop1, 2, as.numeric))
mid_trop <- as.data.frame(apply(mid_trop1, 2, as.numeric))
trop_paus <- as.data.frame((apply(trop1, 2, as.numeric)))
lower_strat <- as.data.frame(apply(lower_strat1, 2, as.numeric))
all<- rbind(lower_trop,mid_trop,trop_paus,lower_strat)
Upvotes: 0
Views: 121
Reputation: 160862
Don't ignore warnings. Your fread
calls are warning
lower_trop <- fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " ")
# [100%] Downloaded 89265 bytes...
# Warning in fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", :
# Stopped early on line 520. Expected 29 fields but found 0. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<Trend 0.14 0.18 0.12 0.16 0.19 0.14 0.11 0.16 0.10 0.12 0.16 0.11 0.18 0.21 0.17 0.10 0.15 0.09 0.25 0.23 0.27 0.01 0.09 -0.02 0.18 0.18 0.18>>
This is because those files have trailing material that are not "proper CSV" (as far as R cares, that is):
Year Mo Globe Land Ocean NH Land Ocean SH Land Ocean Trpcs Land Ocean NoExt Land Ocean SoExt Land Ocean NoPol Land Ocean SoPol Land Ocean USA48 USA49 AUST
1978 12 -0.48 -0.51 -0.47 -0.44 -0.46 -0.42 -0.52 -0.62 -0.50 -0.60 -0.62 -0.59 -0.37 -0.44 -0.30 -0.46 -0.55 -0.45 -0.39 -0.68 -0.06 -0.45 -0.38 -0.49 -1.29 -1.15 -1.29
1979 1 -0.47 -0.64 -0.41 -0.64 -0.86 -0.50 -0.31 -0.13 -0.34 -0.47 -0.54 -0.45 -0.73 -0.93 -0.55 -0.23 0.12 -0.29 -0.46 -0.95 0.10 -0.16 -0.15 -0.16 -3.22 -2.42 0.92
...snip...
2021 11 0.08 0.22 0.03 0.11 0.32 -0.02 0.06 -0.00 0.07 0.14 0.35 0.08 0.09 0.31 -0.10 0.02 -0.24 0.06 -0.43 -0.19 -0.70 -0.72 -0.67 -0.75 0.50 -0.06 -0.29
2021 12 0.21 0.17 0.22 0.27 0.19 0.32 0.15 0.12 0.15 0.03 0.30 -0.04 0.40 0.19 0.59 0.20 -0.07 0.25 0.01 -0.25 0.31 -0.42 -0.24 -0.51 1.63 1.42 -0.06
Year Mo Globe Land Ocean NH Land Ocean SH Land Ocean Trpcs Land Ocean NoExt Land Ocean SoExt Land Ocean NoPol Land Ocean SoPol Land Ocean USA48 USA49 AUST
Trend 0.14 0.18 0.12 0.16 0.19 0.14 0.11 0.16 0.10 0.12 0.16 0.11 0.18 0.21 0.17 0.10 0.15 0.09 0.25 0.23 0.27 0.01 0.09 -0.02 0.18 0.18 0.18
NOTE: Version 6.0 as of April 2015
NOTE: New Reference for annual cycle 1991-2020
NOTE: Version 5.6 as of Jun 2013
NOTE: 3/13 More accurate grid weighting for USA48
NOTE: 3/13 Alaska added in USA49, Australia added
GL 90S-90N, NH 0-90N, SH 90S-0, TRPCS 20S-20N
NoExt 20N-90N, SoExt 90S-20S, NoPol 60N-90N, SoPol 90S-60S
There is no easy way to just "ignore" those lines, so I suggest you be very clear about it: state that that warning is expected and not a concern here. Your future self (and colleagues) will thank you. You can use this to silence just that warning (and allow fread
to continue normal operations):
lower_trop <- withCallingHandlers(
fread("https://www.nsstc.uah.edu/data/msu/v6.0/tlt/uahncdc_lt_6.0.txt", sep = " "),
warning = function(w) {
if (grepl("Stopped early on line", as.character(w)))
invokeRestart("muffleWarning")
})
R dislikes non-unique column names. I don't know anything about the data, but it appears that Land
and Ocean
are grouped by the preceding columns Globe
, NH
, SH
, etc. Let's rename them to ensure they are unique and unambiguous (not just numbers):
lands <- which(names(lower_trop1) == "Land")
names(lower_trop1)[lands] <- paste0(names(lower_trop1)[lands-1], "_", names(lower_trop1)[lands])
names(lower_trop1)[lands+1] <- paste0(names(lower_trop1)[lands-1], "_", names(lower_trop1)[lands+1])
head(lower_trop1)
# Year Mo Globe Globe_Land Globe_Ocean NH NH_Land NH_Ocean SH SH_Land SH_Ocean Trpcs Trpcs_Land
# <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char> <char>
# 1: 1978 12 -0.48 -0.51 -0.47 -0.44 -0.46 -0.42 -0.52 -0.62 -0.50 -0.60 -0.62
# 2: 1979 1 -0.47 -0.64 -0.41 -0.64 -0.86 -0.50 -0.31 -0.13 -0.34 -0.47 -0.54
# 3: 1979 2 -0.43 -0.56 -0.39 -0.47 -0.57 -0.41 -0.39 -0.53 -0.37 -0.36 -0.25
# 4: 1979 3 -0.38 -0.51 -0.33 -0.46 -0.51 -0.44 -0.30 -0.53 -0.26 -0.36 -0.43
# 5: 1979 4 -0.40 -0.57 -0.34 -0.47 -0.62 -0.37 -0.34 -0.46 -0.31 -0.35 -0.37
# 6: 1979 5 -0.40 -0.56 -0.33 -0.52 -0.54 -0.52 -0.27 -0.62 -0.19 -0.46 -0.55
# 16 variables not shown: [Trpcs_Ocean <char>, NoExt <char>, NoExt_Land <char>, NoExt_Ocean <char>, SoExt <char>, SoExt_Land <char>, SoExt_Ocean <char>, NoPol <char>, NoPol_Land <char>, NoPol_Ocean <char>, ...]
(Almost) Never use apply
on a data.frame
. Okay, almost never, there are legitimate times to do it. However, by default, apply
is going to return a matrix
when you really need it to remain a data.frame
(or at least list
-like). Use lapply
instead:
lower_trop1[] <- lapply(lower_trop1, as.numeric)
Two tricks to know about this and how it works with frames:
A data.frame
is just a glorified list
with some added conditions: all elements should be named uniquely; all elements (typically) must be the same length; it may have row-names, some tools use/honor/keep them, some tools (dplyr
) ignore or actively remove them. Oh, and the class
is slightly different. But most things that work on list
s work just as well on data.frame
s, column-wise.
Even though we're working on columns of a frame, in one sense we're just working on elements of a list
, and we want it to stay list
-like, so we should be using lapply
.
Unfortunately, lapply
is going to return a list
not a data.frame
, so the reassignment into lower_trop1[] <- ...
says to replace the contents of lower_trop1
without changing its class. There is likely more that can be said here, but know that frame[] <- lapply(frame, ...)
preserves the data.frame
properties we need while allowing very simple lapply
-operations.
From here, rbind
should work as intended.
FYI, you can combine your middle two steps with:
lower_trop1 <- lower_trop[-518,][, lapply(.SD, as.numeric)]
though you still need to do the renaming for unique columns names.
Upvotes: 1
Reputation: 365
I guess you want some sort of join (so joining the columns from the 4 datasets). Here some information on the different join options in R: https://dplyr.tidyverse.org/reference/mutate-joins.html
Now in your case, you have the Year
and the Mo
columns that appear in your different datasets. So you could join the datasets by these columns. Here an example of a left join (just for the first two dataframes):
First, make sure that you have unique column names in each single dataframe (you do currently not have unique names, so this step is necessary):
names(lower_trop) <- make.unique(names(lower_trop), sep="_")
names(mid_trop) <- make.unique(names(lower_trop), sep="_")
Then you join these with a left join (so you keep all the rows from the left dataset):
new_df <- left_join(lower_trop, mid_trop, by = c("Year", "Mo"), suffix = c(".lower_trop", ".mid_trop"))
Upvotes: 0