user2502338
user2502338

Reputation: 899

Using R, how to fill empty cells of a dataframe in Column B with previous row value based on the relationship with Column A value

I have this type of dataframe:

df <- data.frame(ID = rep(letters[1:5], each = 2), 
DESC = as.character(as.factor(rep(c("Petit", " ", "Small", " ", "Medium", " ", "Large", " ", "X-Large", " "), times = 1))))

Basically, I need to paste the character string in the 'DESC' column with the corresponding 'ID' rows. Ultimately, the result should look like this:

> df
      ID    DESC
1   a   Petit
2   a   Petit
3   b   Small
4   b   Small
5   c  Medium
6   c  Medium
7   d   Large
8   d   Large
9   e X-Large
10  e X-Large

Please note my actual dataframe is not this simple. For example, I have identical names in the 'ID' column which vary in the number of rows from 1 to 25 in which I need to paste the value in 'DESC' for that corresponding 'ID.' So, ID 'a' may have 24 rows in 'DESC' in which I need to fill 'Petit' and 'b' my have one row in which I need to fill 'Small.'

I have tried writing scripts including sapply, grep, paste but failed. I tried writing a loop but it seems when I point to df$DESC it's stored as a factor although I forced it to a character vector...Any help, instruction or point to the functions that can handle this is greatly appreciated. I know I can simply do it in excel, but this is besides the point!! I'm trying to learn how to do this in R, can cannot find any help online regarding this subject.

Thanks!

Upvotes: 0

Views: 5530

Answers (4)

jeremycg
jeremycg

Reputation: 24955

The forward fill solutions are nice, but if it is not sorted, we can remove all ' ' rows, and duplicates, then merge back the result:

merge(subset(df, select = -DESC),unique(df[df$DESC != ' ',]), by = 'ID')

   ID    DESC
1   a   Petit
2   a   Petit
3   b   Small
4   b   Small
5   c  Medium
6   c  Medium
7   d   Large
8   d   Large
9   e X-Large
10  e X-Large

more readable, in multiple steps:

#find mapping
mapping = unique(df[df$DESC != ' ',])

#remove DESC from data
data = subset(df, select = -DESC)

#merge
merge(data, mapping, by = 'ID')

Upvotes: 0

akrun
akrun

Reputation: 887981

Here is an option with dplyr

library(dplyr)
df %>% 
  group_by(ID) %>%
  mutate(DESC = first(DESC))
#      ID    DESC
#   <fctr>  <fctr>
#1       a   Petit
#2       a   Petit
#3       b   Small
#4       b   Small
#5       c  Medium
#6       c  Medium
#7       d   Large
#8       d   Large
#9       e X-Large
#10      e X-Large

Or using data.table

library(data.table)
setDT(df)[, DESC := DESC[1L], by = ID]

Upvotes: 0

HubertL
HubertL

Reputation: 19544

If you can use package zoo:

df[df$DESC==" ","DESC"] <- NA    # Correctly code missing values
df$DESC <- zoo::na.locf(df$DESC)

   ID    DESC
1   a   Petit
2   a   Petit
3   b   Small
4   b   Small
5   c  Medium
6   c  Medium
7   d   Large
8   d   Large
9   e X-Large
10  e X-Large

Upvotes: 0

sirallen
sirallen

Reputation: 1966

If the IDs are sorted with non-blank values in the first position, you can do a simple 'fill' with Reduce:

df$DESC = Reduce(function(x,y) if (y==' ') x else y, df$DESC, acc=T)

> df
#    ID    DESC
# 1   a   Petit
# 2   a   Petit
# 3   b   Small
# 4   b   Small
# 5   c  Medium
# 6   c  Medium
# 7   d   Large
# 8   d   Large
# 9   e X-Large
# 10  e X-Large

Upvotes: 2

Related Questions