Tae Hong
Tae Hong

Reputation: 25

R - replacing blank row values with conditional values from another column

I tried searching and found answers to replacing blank row values with other columns but not on the conditions. Let me explain.

I have a data frame that looks like this:

Name    Grade    Test1    Test2    Test3
John    A        none     none
Jane             B ok     none
David            none     C barely
Sam     B        none
Thomas                             D fail

I would like replace the missing grades in the Grade column with only the letter grades (remove the following comments) from the other columns. There will never be multiple letter grades in Test1/Test2/Test3 columns. So the result I'm loolking for is this:

Name   Grade    Test1    Test2    Test3
John   A        none     none
Jane   B        B ok     none
David  C        none     C barely
Sam    B        none
Thomas D                          D fail

Any help would be appreciated!

Upvotes: 2

Views: 2571

Answers (3)

desc
desc

Reputation: 1210

This worked when I tried it on your data, first take out none from the data frame, then substring for the grade portion of each string, then combine all of the columns into one and generate your final table:

data[data=="none"]=""
A=function(x) substring(x,1,1)
data1=data.frame(data[1],apply(data[2:5],2,a))
all.grades=paste(data1$grade,data1$test1,data1$test2,data1$test3,sep="")
data1$grade=all.grades
final.data=data.frame(data1[1:2],data[3:5])
final.data

name   grade   test1    test2    test3
john       A                      
jane       B    B ok                
david      C          C barely       
sam        B                      
thomas     D                    D fail

Upvotes: 0

Peter Ellis
Peter Ellis

Reputation: 5894

I've shamelessly nicked @akrun's data to show an alternative way of doing it that fits in with the split-apply-combine paradigm

# define data
df1 <-  structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok", 
"none", "none", ""), Test2 = c("none", "none", "C barely", "", 
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name", 
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))

# load up libraries
library(dplyr)
library(tidyr)

# add a primary key
df1 <- df1 %>%
   mutate(PK = 1:nrow(df1))

# turn the test results into tidy format, first by making long and skinny
# and then by bringing it back to one entry per person who has a test result    
test_result <- df1 %>%
   select(PK, Test1:Test3) %>%
   gather(Variable, Value, -PK) %>%
   mutate(Value = ifelse(Value == "none", "", substring(Value, 1, 1))) %>%
   # drop all the unnecessary rows:
   filter(Value != "")

   # join back to the main data, fill in the test score when needed
df1 %>%
   select(PK, Name, Grade) %>%
   left_join(test_result, by = "PK") %>%
   mutate(
      Source = ifelse(Grade %in% LETTERS, "Grade", as.character(Variable)),
      Grade = ifelse(Grade %in% LETTERS, Grade, Value)) %>%
   select(-Value, - PK, -Variable)

This give you a nice tidy dataset that should be better for future analysis and re-use too:

    Name Grade Source
1   John     A  Grade
2   Jane     B  Test1
3  David     C  Test2
4    Sam     B  Grade
5 Thomas     D  Test3

Upvotes: 1

akrun
akrun

Reputation: 886928

Assuming that the columns are character class, we get the index of 'Grade' elements that are blank ('i1')

i1 <- df1$Grade==''

We loop over the 'Test' columns i.e. columns 3 to 5 using vapply, subset the elements in those columns that have a non-space (\\S) followed by a space (\\s) using grep, remove the space and the characters that follow it with sub and assign the output to the blank elements in 'Grade'.

df1$Grade[i1] <- vapply(df1[i1,3:5], function(x)
    sub('\\s+.*$', '', grep('^\\S\\s', x, value=TRUE)), character(1))
df1
#    Name Grade Test1    Test2  Test3
#1   John     A  none     none       
#2   Jane     B  B ok     none       
#3  David     C  none C barely       
#4    Sam     B  none                
#5 Thomas     D                D fail

data

df1 <-  structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok", 
"none", "none", ""), Test2 = c("none", "none", "C barely", "", 
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name", 
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))

Upvotes: 0

Related Questions