Reputation: 55
I need your help! I have following df
df <- data.frame("Time,Value"=c(
"32,2,044678E-02",
"32,05,2,044678E-02",
"32,1,2,044678E-02",
"32,15,2,044678E-02"))
the problem is, the separator, here comma "," , is mixed up with the decimal position, here also comma "," ...
I would like to have the following:
df <- data.frame(Time=c(
32, 32.05, 32.1, 32.15),
Value=c(2.044678E-02, 2.044678E-02, 2.044678E-02, 2.044678E-02))
This is only a short example, but I have over 30k lines of this data!!! For those biologist of you: it is an export of the Aglient Bioanalyzer data and it is a mess getting data, mixed up with english and german decimal indicator!
Hope you can help me. I tried several things, but was unsuccessful
I don't know, but maybe a solution could be to identify the last comma and split the character at this point. Example:
32,2,044678E-02 to 32 2,044678E-02
and
32,15,2,044678E-02 to 32,15 2,044678E-02
and ....so forth Best
Upvotes: 1
Views: 129
Reputation: 23889
Here is a way using gsub
and str_split
:
library(stringr)
df <- data.frame("Time,Value"=c(
"32,2,044678E-02",
"32,05,2,044678E-02",
"32,1,2,044678E-02",
"32,15,2,044678E-02"))
str_split(gsub("(.*[,]?),(.*[,]{1})", "\\1;\\2", df$Time.Value), pattern = ";", simplify = T)
[,1] [,2]
[1,] "32" "2,044678E-02"
[2,] "32,05" "2,044678E-02"
[3,] "32,1" "2,044678E-02"
[4,] "32,15" "2,044678E-02"
Now you can convert to a data.frame
with the desired specs.
Upvotes: 3
Reputation: 18585
There are neater solutions already available. I reckon this one would be quite easy to modify due to straightforward use of lapply
.
df <- data.frame(
"Time,Value" = c(
"32,2,044628E-02",
"32,05,2,054678E-02",
"32,1,2,044878E-02",
"32,15,2,044218E-02"
)
)
as.data.frame(do.call(rbind, lapply(
X = strsplit(df$Time.Value, ",", perl = TRUE),
FUN = function(x) {
c(paste(x[1:(length(x) - 2)], collapse = ".")
,
paste0(x[length(x) - 1], ".", x[length(x)]))
}
))) -> df_clean
setNames(object = df_clean,
unlist(strsplit(
names(df), split = ".", fixed = TRUE
))) -> df_clean
head(df_clean)
Upvotes: 0
Reputation: 28685
Note that precision isn't lost, the values are just rounded for the print method.
library(tidyverse)
df$Time.Value %>%
as.character %>%
strsplit(',') %>%
map_dfr(~ {
# apply function to .x, separately for (last 2 comma-delimited groups) and (all others)
tapply(.x, rev(seq_along(.x)) <= 2
, function(x) as.numeric(paste0(x, collapse = '.'))
, simplify = F) %>%
setNames(c('Time', 'Value'))
})
# # A tibble: 4 x 2
# Time Value
# <dbl> <dbl>
# 1 32 0.0204
# 2 32.0 0.0204
# 3 32.1 0.0204
# 4 32.2 0.0204
Upvotes: 2
Reputation: 957
This REGEX should do the trick:
([^",\s].*),(\d+,\d+E-\d+)
Upvotes: 0