Robert Frey
Robert Frey

Reputation: 23

"How to take a list of strings and insert into a new data frame column based on string in another Column?"

I have data of baseball players and would like to insert a new column with their respective school next to their name. I have the school names in a list that corresponds with the data frame. I want to create a loop that will go to the next school once the first column reaches the character string "Opponents:" What loop do I need to accomplish this?

I've tried using and if then else statement, as well as next to insert the school into the column.

schools <- c("College of Idaho","Aquinas","Avila","Baker")

df$School <- for (i in nrow(df)) 
if(df$Name!="Opponents:") {
schools[1]
else 
next schools
}

I want my df to look like this:

    Name           School
    Van, Austin   College of Idaho
    Lewis, Payton College of Idaho
    ....
    Opponents:     College of Idaho
    Overbeek, Alec Aquinas
    Haran, Noah    Aquinas

Upvotes: 0

Views: 114

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145805

You've got some issues. The biggest one is that you don't use i inside your loop, so nothing changes on different iterations.

df$School <- for 

This won't work, in general. for() doesn't return anything, you need to do assignment inside the loop.

for (i in nrow(df)) 

This is a common typo. You want for (i in 1:nrow(df)), otherwise there will be just one iteration.

if(df$Name!="Opponents:") {

Two problems here: (a) df$Name is the whole column, we want this to be i-specific. (b) from your sample results, you still want to assign the school to "Opponents:" row. So we need to make sure that happens.

schools[1]

This is bad. schools[1] is "College of Idaho". You want this to be able to change to different schools, not always be the first school.

else 
next schools
}

next goes to the next iteration immediately. The schools after it does nothing.

Here's a working for loop (untested, since your data isn't copy/pasteable):

current_school = 1
for (i in 1:nrow(df)) {
  df$Schools[i] = schools[current_school]
  if(df$Name == "Opponents:") {
    current_school = current_school + 1
  }
}

But we don't like looping. Here's a slicker way: first, we'll count up the "Opponents:" rows cumulatively, then we'll offset it by 1 (so that "Opponents:" rows get the same school as the rows above), and then we can do the assignment all at once:

opp_count = cumsum(df$Name == "Opponents:") + 1  # count "Opponents:" rows, starting from 1
opp_count = c(1, opp_count[-nrow(df)]) # offset by 1
df$School = schools[opp_count] # use this to index the schools vector for assignment

I haven't tested these solutions because your data isn't in an easy-to-import format. If you share dput(droplevels(df[1:20, c("Name", "School")])), that will give a copy/pasteable version of your data frame, and I'll be happy to test and debug.

Upvotes: 1

Related Questions