maros89
maros89

Reputation: 45

Subset dataframe based on partial string matching

I have a data frame containing university names and various names of departments, centres, institutions. I would like to extract all cells containing the string "University" and save it as a vector.

I have tried grep function but as I am quite new to R I did not manage to write a correct function working across multiple columns of the data frame.

This is my example:

 V1 = c("asdad","department of x", "University of California",
   "daadasda")
  V2 = c("aadasd","Florence University", "University of Seattle", "NA")
  V3 = c ("aadasd","asdasdasd", "asdasdadads", "fsdfsdfsdf")
  V4 = c ("University of California","Department of g", "asdasd", "sdfsdfsf")

df = as.data.frame(cbind(V1,V2,V3,V4))

Expected result:

Universities: University of California, University of Seattle, Florence University, University of California

The data frame has more or less randomly scattered university names, that I would like to extract into a single vector. As I am interested also in the number of occurrences of particular universities, repeating names in the vector are desirable.

Upvotes: 0

Views: 499

Answers (1)

akrun
akrun

Reputation: 887851

We can unlist the data.frame and grep for `University'

out <- data.Frame(Universities = grep("University", unlist(df), 
         ignore.case = TRIE.  value = TRUE))

Upvotes: 1

Related Questions