Magnus
Magnus

Reputation: 760

How to turn character vector into data frame

My data looks something like this:

13  EDHEC Business School
14  Columbia U and IZA
15  Yale U and Abdul Latif Jameel Poverty Action Lab
16  Carnegie Mellon U
17  Columbia U

As you can see some of the entries contain "multiple" entities, I don't want that. Since the separate_rows function can't handle delimiters consisting of multiple signs (or so I gather) I plan to use the gsub-function to turn all instances of "and" to the letter "ö" (this letter is unlikely to appear naturally in the material). I will then be able to use "ö" as a separator in the separate_rows function.

I start by typing:

distinctAF <- gsub("and", "ö", distinctAF)

This seems to work, but it has turned my data frame into a character vector. I try to change it back via the as.data.frame-function but to no avail:

distinctAF <- as.data.frame(distinctAF)

distinctAF

1   c("MIT", "NBER", "U MI", "Cornell U", "U VA", "Harvard....

I've tried transforming the vector to a matrix as a first step, but this doesn't seem to work either:

distinctAF <- matrix(distinctAF, ncol = 1, byrow = TRUE)

I've also tried to cbind the character vector with a numerical vector with the same length, in the hope of producing a matrix. Strangely, this creates a matrix with one copy of the character vector per number in the numeric vector.

How do I turn my character vector back into a data frame (with one value per row) so that I can separate my rows as intended?

I feel like I've tried everything, this shouldn't be that hard ^^

link to file:

https://www.dropbox.com/s/d4z58w6xvmkyepy/affiliations.csv?dl=0

Upvotes: 0

Views: 332

Answers (1)

Alp Aribal
Alp Aribal

Reputation: 370

Maybe using stringr can help.

require(data.table) # I prefer data.table to data.frame
require(stringr) # Used for string ops

# Read the data
data <- fread("affiliations.csv", skip = 1)
colnames(data) <- c("id", "aff")

# Replace `and`s with `ö`s
data[, mod_aff := str_replace_all(aff, " and ", " ö ")]

# Check if worked
head(data[str_detect(mod_aff, "ö")])
# id                                              aff                                        mod_aff
# 1: 14                               Columbia U and IZA                               Columbia U ö IZA
# 2: 15 Yale U and Abdul Latif Jameel Poverty Action Lab Yale U ö Abdul Latif Jameel Poverty Action Lab
# 3: 21                            ETH Zurich and CESifo                            ETH Zurich ö CESifo
# 4: 22                          U Copenhagen and CESifo                          U Copenhagen ö CESifo
# 5: 26                                U Chicago and IZA                                U Chicago ö IZA
# 6: 28                              Bocconi U and IGIER                              Bocconi U ö IGIER

Upvotes: 0

Related Questions