C_psy
C_psy

Reputation: 647

Fill a column's blank spaces contingent on a second column in R

I'd appreciate some help with this one. I have something similar to the data below.

df$A df$B
1    .
1    .
1    .
1    6
2    .
2    .
2    7

What I need to do is fill in df$B with each value that corresponds to the end of the run of values in df$A. Example below.

df$A df$B
1    6
1    6
1    6
1    6
2    7
2    7
2    7

Any help would be welcome.

Upvotes: 1

Views: 1094

Answers (2)

Anthony Basooma
Anthony Basooma

Reputation: 21

The simplest would be to use tidyr and fill function

library(tidyr)
library(dplyr) #optional, used for piping

Original data

> df <- data.frame(A = c(1,1,1,1,2,2,2), 
+                  B = c(NA, NA, NA, 6, NA, NA, 7)) 
> 
> print(df)
  A  B
1 1 NA
2 1 NA
3 1 NA
4 1  6
5 2 NA
6 2 NA
7 2  7

Using fill without piping the data

> df_up <- fill(df, B, .direction = 'up')
> print(df_up)
  A B
1 1 6
2 1 6
3 1 6
4 1 6
5 2 7
6 2 7
7 2 7

2.Using fill with pipe

> df_up2 <- df %>% fill(B, .direction = 'up')
    > print(df_up2)
      A B
    1 1 6
    2 1 6
    3 1 6
    4 1 6
    5 2 7
    6 2 7
    7 2 7

Upvotes: 0

akrun
akrun

Reputation: 886928

It seems to me that the missing values are denoted by .. It is better to read the dataset with na.strings="." so that the missing values will be NA. For the current dataset, the 'B' column would be character/factor class (depending upon whether you used stringsAsFactors=FALSE/TRUE (default) in the read.table/read.csv.

Using data.table, we convert the data.frame to data.table (setDT(df1)), change the 'character' class to 'numeric' (B:= as.numeric(B)). This will also result in coercing the . to NA (a warning will appear). Grouped by "A", we change the "B" values to the last element (B:= B[.N])

library(data.table)
setDT(df1)[,B:= as.numeric(B)][,B:=B[.N] , by = A]
#   A B
#1: 1 6
#2: 1 6
#3: 1 6
#4: 1 6
#5: 2 7
#6: 2 7
#7: 2 7

Or with dplyr

library(dplyr)
df1 %>%
     group_by(A) %>%
     mutate(B= as.numeric(tail(B,1)))

Or using ave from base R

df1$B <- with(df1, as.numeric(ave(B, A, FUN=function(x) tail(x,1))))

data

df1 <- structure(list(A = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), B = c(".", 
".", ".", "6", ".", ".", "7")), .Names = c("A", "B"), 
class = "data.frame", row.names = c(NA, -7L))

Upvotes: 1

Related Questions