Reputation: 2764
I have 2 dataframes.
I want to change data types of second dataframe based on first. Lets suppose I have below data frame which I am using as template.
> template
id <- c(1,2,3,4)
a <- c(1,4,5,6)
b <- as.character(c(0,1,1,4))
c <- as.character(c(0,1,1,0))
d <- c(0,1,1,0)
template <- data.frame(id,a,b,c,d, stringsAsFactors = FALSE)
> str(template)
'data.frame': 4 obs. of 5 variables:
$ id: num 1 2 3 4
$ a : num 1 4 5 6
$ b : chr "0" "1" "1" "4"
$ c : chr "0" "1" "1" "0"
$ d : num 0 1 1 0
I am looking for below things.
**Note- It should add additional columns with all NA's if not available in df.
> df
id <- c(6,7,12,14,1,3,4,4)
a <- c(0,1,13,1,3,4,5,6)
b <- c(1,4,12,3,4,5,6,7)
c <- c(0,0,13,3,4,45,6,7)
e <- c(0,0,13,3,4,45,6,7)
df <- data.frame(id,a,b,c,e)
> str(df)
'data.frame': 8 obs. of 5 variables:
$ id: num 6 7 12 14 1 3 4 4
$ a : num 0 1 13 1 3 4 5 6
$ b : num 1 4 12 3 4 5 6 7
$ c : num 0 0 13 3 4 45 6 7
$ e : num 0 0 13 3 4 45 6 7
Desired output-
> output
id a b c d
1 6 0 1 0 NA
2 7 1 4 0 NA
3 12 13 12 13 NA
4 14 1 3 3 NA
5 1 3 4 4 NA
6 3 4 5 45 NA
7 4 5 6 6 NA
8 4 6 7 7 NA
> str(output)
'data.frame': 8 obs. of 5 variables:
$ id: num 6 7 12 14 1 3 4 4
$ a : num 0 1 13 1 3 4 5 6
$ b : chr "1" "4" "12" "3" ...
$ c : chr "0" "0" "13" "3" ...
$ d : logi NA NA NA NA NA NA ...
My attempts-
template <- fread("template.csv"),header=TRUE,stringsAsFactors = FALSE)
n <- names(template)
template[,(n) := lapply(.SD,function(x) gsub("[^A-Za-z0-90 _/.-]","", as.character(x)))]
n <- names(df)
df[,(n) := lapply(.SD,function(x) gsub("[^A-Za-z0-90 _/.-]","", as.character(x)))]
output <- rbindlist(list(template,df),use.names = TRUE,fill = TRUE,idcol="template")
After this I write the output data frame and then reread using write.csv to get data types. But, I am messing up with data types. Please suggest any appropriate way to deal with it.
Upvotes: 5
Views: 630
Reputation: 66819
I'd do
res = data.frame(
lapply(setNames(,names(template)), function(x)
if (x %in% names(df)) as(df[[x]], class(template[[x]]))
else template[[x]][NA_integer_]
), stringsAsFactors = FALSE)
or with magrittr
library(magrittr)
setNames(, names(template)) %>%
lapply(. %>% {
if (. %in% names(df)) as(df[[.]], class(template[[.]]))
else template[[.]][NA_integer_]
}) %>% data.frame(stringsAsFactors = FALSE)
verifying ...
'data.frame': 8 obs. of 5 variables:
$ id: num 6 7 12 14 1 3 4 4
$ a : num 0 1 13 1 3 4 5 6
$ b : chr "1" "4" "12" "3" ...
$ c : chr "0" "0" "13" "3" ...
$ d : num NA NA NA NA NA NA NA NA
I'd suggest looking at the vetr package if you're going to be doing a lot of stuff like this. It has a good approach to templates for data frames and their columns.
Upvotes: 2
Reputation: 4824
Here's some code that does what you want.
require(tidyverse)
new_types <-
map_df(template, class) %>%
t %>%
as.data.frame(stringsAsFactors = F) %>%
rownames_to_column %>%
setNames(c('col', 'type'))
new_data <- df %>%
gather(col, value) %>%
right_join(new_types, by='col') %>%
group_by(col) %>%
mutate(rownum = row_number()) %>%
ungroup %>%
complete(col, rownum=1:max(rownum)) %>%
group_by(col) %>%
summarize(val = list(value), type=first(type)) %>%
mutate(new_val = map2(val, type, ~as(.x, .y, strict = T))) %>%
select(col, new_val) %>%
spread(col, new_val) %>%
unnest
The main idea here is to use map2()
from the purrr
package to apply the as()
function from base R. This function takes in an object (e.g. a vector or column from a dataframe) and a character string that describes a new type, and returns the coerced object. This is the core capability that you need.
My new_types
dataframe just lists the column names of the template and the (character-string) named of their type in a data frame.
Except for the map2()
line, everything else is mucky data wrangling that could probably be improved.
Some key features:
right_join
here is essential to keep only the columns you want.mutate(rownum = row_number())
to complete(col, rownum=1:max(rownum))
are necessary only when the target df
has columns that aren't in template
-- they make sure that the resulting number of NA
s is the same as for the other columns.Upvotes: 1