I have a dataframe with multiple columns like this: df <- data.frame( Level = c( 'Midwest', 'Wisconsin: Good', 'Wisconsin: Neutral', 'Wisconsin: Bad', NA, 'Minnesota: Good', 'Minnesota: Neutral', 'Minnesota: Bad', NA, 'New England', 'New Hampshire: Good', 'New Hampshire: Neutral', 'New Hampshire: Bad', NA, 'Vermont: Good', 'Vermont: Neutral', 'Vermont: Bad' ), TotalPct = c( NA, '75%', '71%', '65%', NA, '82%', '76%', '53%', NA, NA, '68%', '98%', '45%', NA, '79%', '93%', '48%' ) ) I'm interested in doing two things to this dataframe: Grab the first instance of the Good level for each state and place it in a new row about the Good level, and... Remove the state names from each level so it just shows Good , Neutral , and Bad below each state name. The result would look like this: df_desired <- data.frame( Level = c( 'Midwest', 'Wisconsin', 'Good', 'Neutral', 'Bad', NA, 'Minnesota', 'Good', 'Neutral', 'Bad', NA, 'New England', 'New Hampshire', 'Good', 'Neutral', 'Bad', NA, 'Vermont', 'Good', 'Neutral', 'Bad' ), TotalPct = c( NA, NA, '75%', '71%', '65%', NA, NA, '82%', '76%', '53%', NA, NA, NA, '68%', '98%', '45%', NA, NA, '79%', '93%', '48%' ) ) What's the best way to go about achieving this using R, ideally using dplyr if possible?

Reputation: 531

How to move the first instance of a string in a column so it's before that string's group while accounting for multiple columns?

I have a dataframe with multiple columns like this:

df <- data.frame(
  Level = c(
    'Midwest',
    'Wisconsin: Good',
    'Wisconsin: Neutral',
    'Wisconsin: Bad',
    NA,
    'Minnesota: Good',
    'Minnesota: Neutral',
    'Minnesota: Bad',
    NA,
    'New England',
    'New Hampshire: Good',
    'New Hampshire: Neutral',
    'New Hampshire: Bad',
    NA,
    'Vermont: Good',
    'Vermont: Neutral',
    'Vermont: Bad'
  ),
  TotalPct = c(
    NA,
    '75%',
    '71%',
    '65%',
    NA,
    '82%',
    '76%',
    '53%',
    NA,
    NA,
    '68%',
    '98%',
    '45%',
    NA,
    '79%',
    '93%',
    '48%'
  )
)

I'm interested in doing two things to this dataframe:

Grab the first instance of the Good level for each state and place it in a new row about the Good level, and...
Remove the state names from each level so it just shows Good, Neutral, and Bad below each state name.

The result would look like this:

df_desired <- data.frame(
  Level = c(
    'Midwest',
    'Wisconsin',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'Minnesota',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'New England',
    'New Hampshire',
    'Good',
    'Neutral',
    'Bad',
    NA,
    'Vermont',
    'Good',
    'Neutral',
    'Bad'
  ),
  TotalPct = c(
    NA,
    NA,
    '75%',
    '71%',
    '65%',
    NA,
    NA,
    '82%',
    '76%',
    '53%',
    NA,
    NA,
    NA,
    '68%',
    '98%',
    '45%',
    NA,
    NA,
    '79%',
    '93%',
    '48%'
  )
)

What's the best way to go about achieving this using R, ideally using dplyr if possible?

Upvotes: 1

Answers (3)

PaulS

Reputation: 25333

Another possible solution:

library(tidyverse)

df %>% 
  separate_rows(Level,sep=": ") %>% 
  filter(
    (Level %in% c("Good", "Neutral", "Bad") | is.na(TotalPct)
     | lead(Level) == "Good")) %>% 
  mutate(TotalPct=ifelse(!(Level %in% c("Good", "Neutral", "Bad")), NA, TotalPct))

#> # A tibble: 21 × 2
#>    Level     TotalPct
#>    <chr>     <chr>   
#>  1 Midwest   <NA>    
#>  2 Wisconsin <NA>    
#>  3 Good      75%     
#>  4 Neutral   71%     
#>  5 Bad       65%     
#>  6 <NA>      <NA>    
#>  7 Minnesota <NA>    
#>  8 Good      82%     
#>  9 Neutral   76%     
#> 10 Bad       53%     
#> # … with 11 more rows

Upvotes: 1

Jon Spring

Reputation: 66500

This feels clunky but I think it gets there:

library(tidyverse)
df %>%
  separate(Level, c("region", "rating"), sep = ": ") %>%
  mutate(copies = if_else(rating != "Good" | is.na(rating), 1, 2)) %>%
  uncount(copies, .id = "id") %>%
  mutate(TotalPct = if_else(rating == "Good" & id == 1, NA_character_, TotalPct),
         level = if_else(is.na(TotalPct), region, rating)) %>%
  select(level, TotalPct)

Result

           level TotalPct
1        Midwest     <NA>
2      Wisconsin     <NA>
3           Good      75%
4        Neutral      71%
5            Bad      65%
6           <NA>     <NA>
7      Minnesota     <NA>
8           Good      82%
9        Neutral      76%
10           Bad      53%
11          <NA>     <NA>
12   New England     <NA>
13 New Hampshire     <NA>
14          Good      68%
15       Neutral      98%
16           Bad      45%
17          <NA>     <NA>
18       Vermont     <NA>
19          Good      79%
20       Neutral      93%
21           Bad      48%

Upvotes: 0

Mossa

Reputation: 1708

I'm a bit unclear as to what you want, but this is usually very useful format:

df %>%
  separate(Level, c("State", "Level"), sep = ": ") %>%
  na.omit() %>% 
  print(n = Inf)
# A tibble: 12 x 3
   State         Level   TotalPct
   <chr>         <chr>   <chr>   
 1 Wisconsin     Good    75%     
 2 Wisconsin     Neutral 71%     
 3 Wisconsin     Bad     65%     
 4 Minnesota     Good    82%     
 5 Minnesota     Neutral 76%     
 6 Minnesota     Bad     53%     
 7 New Hampshire Good    68%     
 8 New Hampshire Neutral 98%     
 9 New Hampshire Bad     45%     
10 Vermont       Good    79%     
11 Vermont       Neutral 93%     
12 Vermont       Bad     48%

Upvotes: 0

How to move the first instance of a string in a column so it&#39;s before that string&#39;s group while accounting for multiple columns?

Answers (3)

Related Questions

How to move the first instance of a string in a column so it's before that string's group while accounting for multiple columns?