Reputation: 49
I'm scraping voting history data from pdfs. Names are separated by spaces in single variable. Want to change data frame so there is a separate column for the names
The names were separated and whitespace eliminated. It produced a list of varying lengths--depending on who voted for it--in a new column in the data frame. Also, experimented with the separate
function in the dplyr
package.
#data.frame as is
bill <- c("HB1", "HB2")
names <- c("a b", "a")
df.0 <- data.frame(bill = bill, names = names, stringsAsFactors = F)
df.0
#data.frame desired
bill <- c("HB1", "HB1", "HB2")
names <- c("a", "b", "a")
df.1 <- data.frame(bill = bill, names = names, stringsAsFactors = F)
df.1
Upvotes: 1
Views: 217
Reputation: 269694
1) tidyr::separate_rows Try separate_rows
in tidyr:
library(dplyr)
library(tidyr)
df.0 %>% separate_rows(names)
giving:
bill names
1 HB1 a
2 HB1 b
3 HB2 a
1a) tidyr::unnest A different tidyr solution can be fashioned from strsplit
and unnest
:
df.0 %>%
mutate(names = strsplit(names, "\\s+")) %>%
unnest
giving:
bill names
1 HB1 a
2 HB1 b
3 HB2 a
2) stack/strsplit This alternative uses no packages. Here we use strsplit
to split names
into a list of character vectors. Add bill
names to that and use stack
to conert that back to a data.frame. stack
will give it hard coded names so use setNames
to set the names back.
setNames(with(df.0, stack(setNames(strsplit(names, "\\s+"), bill)))[2:1], names(df.0))
giving:
bill names
1 HB1 a
2 HB1 b
3 HB2 a
Upvotes: 0
Reputation: 6768
Try out:
library(tidyr)
separate_rows(df.0, names)
# output
bill names
1 HB1 a
2 HB1 b
3 HB2 a
Upvotes: 3