Martin Neumann
Martin Neumann

Reputation: 55

Split character by identifying the last comma appearing in the character string

I need your help! I have following df

df <- data.frame("Time,Value"=c(
"32,2,044678E-02",
"32,05,2,044678E-02",
"32,1,2,044678E-02",
"32,15,2,044678E-02"))

the problem is, the separator, here comma "," , is mixed up with the decimal position, here also comma "," ...

I would like to have the following:

df <- data.frame(Time=c(
  32,  32.05,  32.1,  32.15),
Value=c(2.044678E-02, 2.044678E-02, 2.044678E-02, 2.044678E-02))

This is only a short example, but I have over 30k lines of this data!!! For those biologist of you: it is an export of the Aglient Bioanalyzer data and it is a mess getting data, mixed up with english and german decimal indicator!

Hope you can help me. I tried several things, but was unsuccessful

I don't know, but maybe a solution could be to identify the last comma and split the character at this point. Example:

32,2,044678E-02      to      32           2,044678E-02
and
32,15,2,044678E-02     to     32,15        2,044678E-02

and ....so forth Best

Upvotes: 1

Views: 129

Answers (4)

Martin Schmelzer
Martin Schmelzer

Reputation: 23889

Here is a way using gsub and str_split:

library(stringr)

df <- data.frame("Time,Value"=c(
  "32,2,044678E-02",
  "32,05,2,044678E-02",
  "32,1,2,044678E-02",
  "32,15,2,044678E-02"))


str_split(gsub("(.*[,]?),(.*[,]{1})", "\\1;\\2", df$Time.Value), pattern = ";", simplify = T)


     [,1]    [,2]          
[1,] "32"    "2,044678E-02"
[2,] "32,05" "2,044678E-02"
[3,] "32,1"  "2,044678E-02"
[4,] "32,15" "2,044678E-02"

Now you can convert to a data.frame with the desired specs.

Upvotes: 3

Konrad
Konrad

Reputation: 18585

There are neater solutions already available. I reckon this one would be quite easy to modify due to straightforward use of lapply.

df <- data.frame(
    "Time,Value" = c(
        "32,2,044628E-02",
        "32,05,2,054678E-02",
        "32,1,2,044878E-02",
        "32,15,2,044218E-02"
    )
)

as.data.frame(do.call(rbind, lapply(
    X = strsplit(df$Time.Value, ",", perl = TRUE),
    FUN = function(x) {
        c(paste(x[1:(length(x) - 2)], collapse = ".")
                     ,
                     paste0(x[length(x) - 1], ".", x[length(x)]))
    }
))) -> df_clean


setNames(object = df_clean,
         unlist(strsplit(
             names(df), split = ".", fixed = TRUE
         ))) -> df_clean


head(df_clean)

Upvotes: 0

IceCreamToucan
IceCreamToucan

Reputation: 28685

Note that precision isn't lost, the values are just rounded for the print method.

library(tidyverse)

df$Time.Value %>% 
  as.character %>% 
  strsplit(',') %>% 
  map_dfr(~ {
    # apply function to .x, separately for (last 2 comma-delimited groups) and (all others)
    tapply(.x, rev(seq_along(.x)) <= 2 
           , function(x) as.numeric(paste0(x, collapse = '.'))
           , simplify = F) %>% 
      setNames(c('Time', 'Value'))
  })

# # A tibble: 4 x 2
#    Time  Value
#   <dbl>  <dbl>
# 1  32   0.0204
# 2  32.0 0.0204
# 3  32.1 0.0204
# 4  32.2 0.0204

Upvotes: 2

tk78
tk78

Reputation: 957

This REGEX should do the trick:

([^",\s].*),(\d+,\d+E-\d+)
  • Group1 is your first number
  • Group2 is the second/scientific number

Upvotes: 0

Related Questions