Reputation: 25
I tried searching and found answers to replacing blank row values with other columns but not on the conditions. Let me explain.
I have a data frame that looks like this:
Name Grade Test1 Test2 Test3
John A none none
Jane B ok none
David none C barely
Sam B none
Thomas D fail
I would like replace the missing grades in the Grade column with only the letter grades (remove the following comments) from the other columns. There will never be multiple letter grades in Test1/Test2/Test3 columns. So the result I'm loolking for is this:
Name Grade Test1 Test2 Test3
John A none none
Jane B B ok none
David C none C barely
Sam B none
Thomas D D fail
Any help would be appreciated!
Upvotes: 2
Views: 2571
Reputation: 1210
This worked when I tried it on your data
, first take out none from the data frame, then substring for the grade portion of each string, then combine all of the columns into one and generate your final table:
data[data=="none"]=""
A=function(x) substring(x,1,1)
data1=data.frame(data[1],apply(data[2:5],2,a))
all.grades=paste(data1$grade,data1$test1,data1$test2,data1$test3,sep="")
data1$grade=all.grades
final.data=data.frame(data1[1:2],data[3:5])
final.data
name grade test1 test2 test3
john A
jane B B ok
david C C barely
sam B
thomas D D fail
Upvotes: 0
Reputation: 5894
I've shamelessly nicked @akrun's data to show an alternative way of doing it that fits in with the split-apply-combine paradigm
# define data
df1 <- structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok",
"none", "none", ""), Test2 = c("none", "none", "C barely", "",
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name",
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))
# load up libraries
library(dplyr)
library(tidyr)
# add a primary key
df1 <- df1 %>%
mutate(PK = 1:nrow(df1))
# turn the test results into tidy format, first by making long and skinny
# and then by bringing it back to one entry per person who has a test result
test_result <- df1 %>%
select(PK, Test1:Test3) %>%
gather(Variable, Value, -PK) %>%
mutate(Value = ifelse(Value == "none", "", substring(Value, 1, 1))) %>%
# drop all the unnecessary rows:
filter(Value != "")
# join back to the main data, fill in the test score when needed
df1 %>%
select(PK, Name, Grade) %>%
left_join(test_result, by = "PK") %>%
mutate(
Source = ifelse(Grade %in% LETTERS, "Grade", as.character(Variable)),
Grade = ifelse(Grade %in% LETTERS, Grade, Value)) %>%
select(-Value, - PK, -Variable)
This give you a nice tidy dataset that should be better for future analysis and re-use too:
Name Grade Source
1 John A Grade
2 Jane B Test1
3 David C Test2
4 Sam B Grade
5 Thomas D Test3
Upvotes: 1
Reputation: 886928
Assuming that the columns are character
class, we get the index of 'Grade' elements that are blank ('i1')
i1 <- df1$Grade==''
We loop over the 'Test' columns i.e. columns 3 to 5 using vapply
, subset the elements in those columns that have a non-space (\\S
) followed by a space (\\s
) using grep
, remove the space and the characters that follow it with sub
and assign the output to the blank elements in 'Grade'.
df1$Grade[i1] <- vapply(df1[i1,3:5], function(x)
sub('\\s+.*$', '', grep('^\\S\\s', x, value=TRUE)), character(1))
df1
# Name Grade Test1 Test2 Test3
#1 John A none none
#2 Jane B B ok none
#3 David C none C barely
#4 Sam B none
#5 Thomas D D fail
df1 <- structure(list(Name = c("John", "Jane", "David", "Sam", "Thomas"
), Grade = c("A", "", "", "B", ""), Test1 = c("none", "B ok",
"none", "none", ""), Test2 = c("none", "none", "C barely", "",
""), Test3 = c("", "", "", "", "D fail")), .Names = c("Name",
"Grade", "Test1", "Test2", "Test3"), class = "data.frame",
row.names = c(NA, -5L))
Upvotes: 0