RobWiederstein
RobWiederstein

Reputation: 49

How to convert column list to data frame in R

I'm scraping voting history data from pdfs. Names are separated by spaces in single variable. Want to change data frame so there is a separate column for the names

The names were separated and whitespace eliminated. It produced a list of varying lengths--depending on who voted for it--in a new column in the data frame. Also, experimented with the separate function in the dplyr package.

#data.frame as is
bill <- c("HB1", "HB2")
names <- c("a    b", "a")
df.0 <- data.frame(bill = bill, names = names, stringsAsFactors = F)
df.0

#data.frame desired
bill <- c("HB1", "HB1", "HB2")
names <- c("a", "b", "a")
df.1 <- data.frame(bill = bill, names = names, stringsAsFactors = F)
df.1

Upvotes: 1

Views: 217

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269694

1) tidyr::separate_rows Try separate_rows in tidyr:

library(dplyr)
library(tidyr)

df.0 %>% separate_rows(names)

giving:

  bill names
1  HB1     a
2  HB1     b
3  HB2     a

1a) tidyr::unnest A different tidyr solution can be fashioned from strsplit and unnest:

df.0 %>%
  mutate(names = strsplit(names, "\\s+")) %>%
  unnest

giving:

  bill names
1  HB1     a
2  HB1     b
3  HB2     a

2) stack/strsplit This alternative uses no packages. Here we use strsplit to split names into a list of character vectors. Add bill names to that and use stack to conert that back to a data.frame. stack will give it hard coded names so use setNames to set the names back.

setNames(with(df.0, stack(setNames(strsplit(names, "\\s+"), bill)))[2:1], names(df.0))

giving:

  bill names
1  HB1     a
2  HB1     b
3  HB2     a

Upvotes: 0

nghauran
nghauran

Reputation: 6768

Try out:

library(tidyr)
separate_rows(df.0, names)

# output
  bill names
1  HB1     a
2  HB1     b
3  HB2     a

Upvotes: 3

Related Questions