Reputation: 899
I have this type of dataframe:
df <- data.frame(ID = rep(letters[1:5], each = 2),
DESC = as.character(as.factor(rep(c("Petit", " ", "Small", " ", "Medium", " ", "Large", " ", "X-Large", " "), times = 1))))
Basically, I need to paste the character string in the 'DESC' column with the corresponding 'ID' rows. Ultimately, the result should look like this:
> df
ID DESC
1 a Petit
2 a Petit
3 b Small
4 b Small
5 c Medium
6 c Medium
7 d Large
8 d Large
9 e X-Large
10 e X-Large
Please note my actual dataframe is not this simple. For example, I have identical names in the 'ID' column which vary in the number of rows from 1 to 25 in which I need to paste the value in 'DESC' for that corresponding 'ID.' So, ID 'a' may have 24 rows in 'DESC' in which I need to fill 'Petit' and 'b' my have one row in which I need to fill 'Small.'
I have tried writing scripts including sapply, grep, paste but failed. I tried writing a loop but it seems when I point to df$DESC it's stored as a factor although I forced it to a character vector...Any help, instruction or point to the functions that can handle this is greatly appreciated. I know I can simply do it in excel, but this is besides the point!! I'm trying to learn how to do this in R, can cannot find any help online regarding this subject.
Thanks!
Upvotes: 0
Views: 5530
Reputation: 24955
The forward fill solutions are nice, but if it is not sorted, we can remove all ' ' rows, and duplicates, then merge back the result:
merge(subset(df, select = -DESC),unique(df[df$DESC != ' ',]), by = 'ID')
ID DESC
1 a Petit
2 a Petit
3 b Small
4 b Small
5 c Medium
6 c Medium
7 d Large
8 d Large
9 e X-Large
10 e X-Large
more readable, in multiple steps:
#find mapping
mapping = unique(df[df$DESC != ' ',])
#remove DESC from data
data = subset(df, select = -DESC)
#merge
merge(data, mapping, by = 'ID')
Upvotes: 0
Reputation: 887981
Here is an option with dplyr
library(dplyr)
df %>%
group_by(ID) %>%
mutate(DESC = first(DESC))
# ID DESC
# <fctr> <fctr>
#1 a Petit
#2 a Petit
#3 b Small
#4 b Small
#5 c Medium
#6 c Medium
#7 d Large
#8 d Large
#9 e X-Large
#10 e X-Large
Or using data.table
library(data.table)
setDT(df)[, DESC := DESC[1L], by = ID]
Upvotes: 0
Reputation: 19544
If you can use package zoo
:
df[df$DESC==" ","DESC"] <- NA # Correctly code missing values
df$DESC <- zoo::na.locf(df$DESC)
ID DESC
1 a Petit
2 a Petit
3 b Small
4 b Small
5 c Medium
6 c Medium
7 d Large
8 d Large
9 e X-Large
10 e X-Large
Upvotes: 0
Reputation: 1966
If the IDs are sorted with non-blank values in the first position, you can do a simple 'fill' with Reduce
:
df$DESC = Reduce(function(x,y) if (y==' ') x else y, df$DESC, acc=T)
> df
# ID DESC
# 1 a Petit
# 2 a Petit
# 3 b Small
# 4 b Small
# 5 c Medium
# 6 c Medium
# 7 d Large
# 8 d Large
# 9 e X-Large
# 10 e X-Large
Upvotes: 2