Reputation: 1
Thanks in advance! I have been trying this for a few days, and I am kind of stuck. I am trying to loop through a text file (imported as a list), and create a data frame from the text file. The data frame starts a new row if the item in the list has a day of the week in the text, and will populate in the first column (V1). I want to put the rest of the comments in the second column (V2) and I may have to concatenate strings together. I am trying to use a conditional with grepl(), but I am kind of lost on the logic after I set up the initial data frame.
Here is an example text I am bringing into R (it is Facebook data from a text file). The []'s signify the list number. It is a lengthy file (50K+ lines) but I have the date column set up.
[1] Thursday, August 25, 2016 at 3:57pm EDT
[2] Football time!! We need to make plans!!!! I texted my guy, though haven't been in touch sense last year. So we'll see on my end!!! What do you have cooking???
[3]Sunday, August 14, 2016 at 9:17am EDT
[4]Michael shared Jason post.
[5]This bird is a lot smarter than the majority of political posts I have read recently here
[6]Sunday, August 14, 2016 at 8:44am EDT
[7]Michael and Kurt are now friends.
The end result would be data frame where the day of the week starts a new row in the data frame, and the rest of the list is concatenated into the second column of the data frame. So the end data fame would be
Row 1 ([1] in V1 and [2] in V2)
Row 2 ([3] in V1 and [4],[5] in V2)
Row 3 ([6] in V1 and [7] in V2)
Here is the start of my code, and I can get V1 to populate correctly, but not the second column of the data frame.
### Read in the text file
temp <- readLines("C:/Program Files/R/Text Mining/testa.txt")
### Remove empty lines from the text file
temp <- temp[temp!=""]
### Create the temp char file as a list file
tmp <- as.list(temp)
### A days vector for searching through the list of days.
days <- c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday")
df <- {}
### Loop through the list
for (n in 1:length(tmp)){
### Search to see if there is a day in the list item
for(i in 1:length(days)){
if(grepl(days[i], tmp[n])==1){
### Bind the row to the df if there is a day in the list item
df<- rbind(df, tmp[n])
}
}
### I know this is wrong, I am trying to create a vector to concatenate and add to the data frame, but I am struggling here.
d <- c(d, tmp[n])
}
Upvotes: 0
Views: 96
Reputation: 43344
Here's an option using the tidyverse:
library(tidyverse)
text <- "[1] Thursday, August 25, 2016 at 3:57pm EDT
[2] Football time!! We need to make plans!!!! I texted my guy, though haven't been in touch sense last year. So we'll see on my end!!! What do you have cooking???
[3]Sunday, August 14, 2016 at 9:17am EDT
[4]Michael shared Jason post.
[5]This bird is a lot smarter than the majority of political posts I have read recently here
[6]Sunday, August 14, 2016 at 8:44am EDT
[7]Michael and Kurt are now friends."
df <- data_frame(lines = read_lines(text)) %>% # read data, set up data.frame
filter(lines != '') %>% # filter out empty lines
# set grouping by cumulative number of rows with weekdays in them
group_by(grp = cumsum(grepl(paste(weekdays(1:7, abbreviate = FALSE), collapse = '|'), lines))) %>%
# collapse each group to two columns
summarise(V1 = lines[1], V2 = list(lines[-1]))
df
## # A tibble: 3 × 3
## grp V1 V2
## <int> <chr> <list>
## 1 1 [1] Thursday, August 25, 2016 at 3:57pm EDT <chr [1]>
## 2 2 [3]Sunday, August 14, 2016 at 9:17am EDT <chr [2]>
## 3 3 [6]Sunday, August 14, 2016 at 8:44am EDT <chr [1]>
This approach uses a list column for V2
, which is probably the best approach in terms of preserving your data, but use paste
or toString
if you need.
Roughly equivalent base R:
df <- data.frame(V2 = readLines(textConnection(text)), stringsAsFactors = FALSE)
df <- df[df$V2 != '', , drop = FALSE]
df$grp <- cumsum(grepl(paste(weekdays(1:7, abbreviate = FALSE), collapse = '|'), df$V2))
df$V1 <- ave(df$V2, df$grp, FUN = function(x){x[1]})
df <- aggregate(V2 ~ grp + V1, df, FUN = function(x){x[-1]})
df
## grp V1
## 1 1 [1] Thursday, August 25, 2016 at 3:57pm EDT
## 2 2 [3]Sunday, August 14, 2016 at 9:17am EDT
## 3 3 [6]Sunday, August 14, 2016 at 8:44am EDT
## V2
## 1 [2] Football time!! We need to make plans!!!! I texted my guy, though haven't been in touch sense last year. So we'll see on my end!!! What do you have cooking???
## 2 [4]Michael shared Jason post., [5]This bird is a lot smarter than the majority of political posts I have read recently here
## 3 [7]Michael and Kurt are now friends.
Upvotes: 1