User06
User06

Reputation: 35

How do I fill in values for columns based on matching few other column's row values in R

Data looks like below.

 time <- c('Nov 1st 2014, 17:36:50.000','Nov 1st 2014, 17:36:50.000',
          'Nov 1st 2014, 17:36:50.000','Nov 1st 2014, 17:36:50.000', 'Nov 1st 2014, 17:37:50.000','Nov 1st 2014, 17:37:50.000','Nov 1st 2014, 17:37:50.000')
A <- c('20.79','NA','NA','NA','21.8','NA','NA')  
B <- c('NA','97.017','94.321','85.014','NA','87.1','67.1')
C <- c('NA','C1','C2','C3','NA','C1','C2')
D <- c('L1','L1','L1','L1','L2','L2','L2')
C1 <- c('NA','NA','NA','NA','NA','NA','NA')
C2 <- c('NA','NA','NA','NA','NA','NA','NA')
C3 <- c('NA','NA','NA','NA','NA','NA','NA')
df <- data.frame(time,A,B,C,D,C1,C2,C3)

I need output in the below format.

#   time                           A     B  C    D        C1     C2      C3
# 1 Nov 1st 2014, 17:36:50.000  20.79    NA NA   L1       97.02  94.321  85.014
    Nov 1st 2014, 17:37:50.000  21.8     NA NA   L2       87.1   67.1   47.3

How do I get the data in the above format in just one row as columns "time" and "D" are same for all the rows?

Thanks in advance!

Upvotes: 3

Views: 112

Answers (2)

Uwe
Uwe

Reputation: 42544

Step-by-step approach

If I understand correctly, OP's dataset actually consists of two intermixed datasets:

df
                        time     A      B  C  D C1 C2 C3
1 Nov 1st 2014, 17:36:50.000 20.79     NA NA L1 NA NA NA
2 Nov 1st 2014, 17:36:50.000    NA 97.017 C1 L1 NA NA NA
3 Nov 1st 2014, 17:36:50.000    NA 94.321 C2 L1 NA NA NA
4 Nov 1st 2014, 17:36:50.000    NA 85.014 C3 L1 NA NA NA
5 Nov 1st 2014, 17:37:50.000  21.8     NA NA L2 NA NA NA
6 Nov 1st 2014, 17:37:50.000    NA   87.1 C1 L2 NA NA NA
7 Nov 1st 2014, 17:37:50.000    NA   67.1 C2 L2 NA NA NA

which need to be separated:

library(data.table)
df1 <- setDT(df)[A != "NA", .(time, A, D)]
df1
                         time     A  D
1: Nov 1st 2014, 17:36:50.000 20.79 L1
2: Nov 1st 2014, 17:37:50.000  21.8 L2

and

df2 <- df[A == "NA", .(time, B, C, D)]
df2
                         time      B  C  D
1: Nov 1st 2014, 17:36:50.000 97.017 C1 L1
2: Nov 1st 2014, 17:36:50.000 94.321 C2 L1
3: Nov 1st 2014, 17:36:50.000 85.014 C3 L1
4: Nov 1st 2014, 17:37:50.000   87.1 C1 L2
5: Nov 1st 2014, 17:37:50.000   67.1 C2 L2

The key columns which identify unique subsets of rows are time and D. Columns C1, C2, and C3 are dropped as they will be created in the next step.

The second dataset is to be reshaped from long to wide format:

wide <- dcast(df2, time + D ~ C, value.var = "B")
wide
                         time  D     C1     C2     C3
1: Nov 1st 2014, 17:36:50.000 L1 97.017 94.321 85.014
2: Nov 1st 2014, 17:37:50.000 L2   87.1   67.1   <NA>

Now both partial results can be joined together:

df1[wide, on = .(time, D)]
                         time     A  D     C1     C2     C3
1: Nov 1st 2014, 17:36:50.000 20.79 L1 97.017 94.321 85.014
2: Nov 1st 2014, 17:37:50.000  21.8 L2   87.1   67.1   <NA>

Note that columns B and C have been dropped from the result as they convey no information.

Compact code

This steps above can be combined into fewer statements:

library(data.table)
setDT(df)[, (paste0("C", 1:3)) := NULL]
df[A != "NA"][dcast(df[C != "NA"], time + D ~ C, value.var = "B"), on = .(time, D)]
                         time     A  B  C  D     C1     C2     C3
1: Nov 1st 2014, 17:36:50.000 20.79 NA NA L1 97.017 94.321 85.014
2: Nov 1st 2014, 17:37:50.000  21.8 NA NA L2   87.1   67.1   <NA>

Data

as provided by the OP with NA values given as strings

time <- c('Nov 1st 2014, 17:36:50.000','Nov 1st 2014, 17:36:50.000',
          'Nov 1st 2014, 17:36:50.000','Nov 1st 2014, 17:36:50.000', 'Nov 1st 2014, 17:37:50.000','Nov 1st 2014, 17:37:50.000','Nov 1st 2014, 17:37:50.000')
A <- c('20.79','NA','NA','NA','21.8','NA','NA')  
B <- c('NA','97.017','94.321','85.014','NA','87.1','67.1')
C <- c('NA','C1','C2','C3','NA','C1','C2')
D <- c('L1','L1','L1','L1','L2','L2','L2')
C1 <- c('NA','NA','NA','NA','NA','NA','NA')
C2 <- c('NA','NA','NA','NA','NA','NA','NA')
C3 <- c('NA','NA','NA','NA','NA','NA','NA')
df <- data.frame(time,A,B,C,D,C1,C2,C3)

Upvotes: 0

Paul
Paul

Reputation: 2959

You can do this with dplyr::gather() to re-shape B into C1, C2, C3, and then dplyr::join() it with the other columns, assuming a unique date/time.

library(dplyr)
library(tidyr)

df %>%
  select(time, A, B, C, D) %>%
  filter(!is.na(A)) %>%
  left_join(
    df %>%
      select(time, C, B, D) %>%
      spread(C, B) %>%
      select(-`<NA>`),
    by = c("time", "D")
  )

#                         time     A  B    C  D     C1     C2     C3
# 1 Nov 1st 2014, 17:36:50.000 20.79 NA <NA> L1 97.017 94.321 85.014
# 2 Nov 1st 2014, 17:37:50.000 21.80 NA <NA> L2 87.100 67.100 47.300

data

df <- read.table(text = "time A B C D C1 C2 C3
1 'Nov 1st 2014, 17:36:50.000' 20.79 NA NA L1 NA NA NA
2 'Nov 1st 2014, 17:36:50.000' NA 97.017 C1 L1 NA NA NA
3 'Nov 1st 2014, 17:36:50.000' NA 94.321 C2 L1 NA NA NA
4 'Nov 1st 2014, 17:36:50.000' NA 85.014 C3 L1 NA NA NA
5 'Nov 1st 2014, 17:37:50.000' 21.8 NA NA L2 NA NA NA
6 'Nov 1st 2014, 17:37:50.000' NA 87.1 C1 L2 NA NA NA
7 'Nov 1st 2014, 17:37:50.000' NA 67.1 C2 L2 NA NA NA
8 'Nov 1st 2014, 17:37:50.000' NA 47.3 C3 L2 NA NA NA",
                 header = T,
                 stringsAsFactors = F)

Upvotes: 1

Related Questions