Reputation: 3881
From a csv file I loaded date into an R dataframe that looks like this:
> head(mydata)
row lengthArray sports num_runs percent_runs
1 0 4 [24, 18, 24, 18] 0 0
2 1 10 [2, 2, 2, 2, 2, 2, 2, 2, 2, 2] 0 0
3 2 4 [0, 0, 0, 0] 0 0
4 3 2 [0, 0] 0 0
5 4 2 [18, 18] 0 0
6 5 1 [0] 0 0
I can access and get the types for the integer data frames no problem, but I can't figure out how to access sports
:
> class(mydata[4,3])
[1] "factor"
> string_factor = mydata[1,3]
> string_factor
[1] [24, 18, 24, 18]
6378 Levels: [0] [0, 0] [0, 0, 0] [0, 0, 0, 0] ... [9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
> class(string_factor)
[1] "factor"
> string_factor_numeric = as.numeric(string_factor)
> string_factor_numeric
[1] 5181
I guess the best R response would be "don't do this", but this is how the data is coming, so I am wondering how I can get those numbers out of the array so that I can use them.
I should also mention that this Convert data.frame columns from factors to characters gave no error message but had no effect, as the array column continued to be classed as factors.
UPDATE: from the comments, you can see this can get you somewhere:
mydata[,3] <- as.character(mydata[,3])
However this still does not get you to an array with individually accessible elements.
Upvotes: 4
Views: 124
Reputation: 21621
Here's another idea using splitstackshape
:
library(splitstackshape)
library(dplyr)
mydata %>%
mutate(sports = gsub("\\[|\\]", "", sports)) %>%
cSplit("sports", sep = ",", direction = "wide")
Which gives:
row lengthArray num_runs percent_runs sports_01 sports_02 sports_03 sports_04 sports_05 sports_06 sports_07 sports_08 sports_09 sports_10
1: 0 4 0 0 24 18 24 18 NA NA NA NA NA NA
2: 1 10 0 0 2 2 2 2 2 2 2 2 2 2
3: 2 4 0 0 0 0 0 0 NA NA NA NA NA NA
4: 3 2 0 0 0 0 NA NA NA NA NA NA NA NA
5: 4 2 0 0 18 18 NA NA NA NA NA NA NA NA
6: 5 1 0 0 0 NA NA NA NA NA NA NA NA NA
Or as per @thelatemail comment, you could also store a list as a column:
library(stringi)
df <- mydata %>%
mutate(sports = as.list(stri_extract_all(sports, regex = "[:digit:]")))
Which will give you the following data structure:
> str(df)
#'data.frame': 6 obs. of 5 variables:
# $ row : int 0 1 2 3 4 5
# $ lengthArray : int 4 10 4 2 2 1
# $ sports :List of 6
# ..$ : chr "2" "4" "1" "8" ...
# ..$ : chr "2" "2" "2" "2" ...
# ..$ : chr "0" "0" "0" "0"
# ..$ : chr "0" "0"
# ..$ : chr "1" "8" "1" "8"
# ..$ : chr "0"
# $ num_runs : int 0 0 0 0 0 0
# $ percent_runs: int 0 0 0 0 0 0
You can then access the elements of the list like this:
> df$sports[[1]][1] #first element of first list
#[1] "2"
Upvotes: 4
Reputation: 145755
Here's your data with dput
:
mydata = structure(list(row = 0:5, lengthArray = c(4L, 10L, 4L, 2L, 2L,
1L), sports = structure(c(6L, 5L, 1L, 2L, 4L, 3L), .Label = c("[0, 0, 0, 0]",
"[0, 0]", "[0]", "[18, 18]", "[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]",
"[24, 18, 24, 18]"), class = "factor"), num_runs = c(0L, 0L,
0L, 0L, 0L, 0L), percent_runs = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("row",
"lengthArray", "sports", "num_runs", "percent_runs"), class = "data.frame", row.names = c(NA,
-6L))
First we convert the sports column to a character
mydata$sports = as.character(mydata$sports)
Now I'll get rid of the brackets and spaces (leaving the commas)
library(stringr)
mydata$sports = str_replace_all(mydata$sports, pattern = "\\[|\\]| ", "")
And lastly separate the sports column into multiple columns
library(tidyr)
mydata = separate(mydata, sports, into = paste0("sport", 1:max(mydata$lengthArray)), sep = ",", extra = "drop")
mydata
# row lengthArray sport1 sport2 sport3 sport4 sport5 sport6 sport7 sport8 sport9 sport10 num_runs percent_runs
#1 0 4 24 18 24 18 <NA> <NA> <NA> <NA> <NA> <NA> 0 0
#2 1 10 2 2 2 2 2 2 2 2 2 2 0 0
#3 2 4 0 0 0 0 <NA> <NA> <NA> <NA> <NA> <NA> 0 0
#4 3 2 0 0 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 0 0
#5 4 2 18 18 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 0 0
#6 5 1 0 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 0 0
Upvotes: 1
Reputation: 11597
Recreating your data:
text = "
row lengthArray sports num_runs percent_runs
0 4 '[24, 18, 24, 18]' 0 0
1 10 '[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]' 0 0
2 4 '[0, 0, 0, 0]' 0 0
3 2 '[0, 0]' 0 0
4 2 '[18, 18]' 0 0
5 1 '[0]' 0 0"
data <- read.table(text = text, header= TRUE)
You probably shoud take the values in sports and create new columns... but, if want to create the vectors inside the sports
column, you can actually do that:
data$sports <- as.character(data$sports)
data$sports <- lapply(data$sports, function(x) eval(parse(text = paste0("c(", gsub("\\[|\\]", "", x),")"))))
Now, for example, if you want to get the third value of the first line of sports
:
data$sports[[1]][[3]]
[1] 24
Upvotes: 0