Reputation: 67
I have a large dataframe (approximately 40000 x 500) that I read from a .CSV file. The dataset contains a list of error code combinations, with each combination specifically listing all the unique codes. I'm searching for the rows that contain a certain piece of string, like "name="
, to see if it is known who listed that combination. The problem is that I do not know in which column this may occur, since the 10th column and onwards are irregular per row, as the number of error codes per combination is irregular. This means that after some point the rows are empty, except for the biggest error code combination. I only know that if the row contains the piece of string I'm looking for, it will be in one of the last 6 columns of a row containing any information. If the piece of string I'm looking for is found in a row, I would like an answer in a different column.
I have extensively looked for a solution, but I couldn't find anything that serves my needs. Also, I'm new to R.
[Fake dataset]
Year Problem_ID Number.of.errors X X.1 X.2 X.3 X.4
1 2005 Server 2 X-4555 X-1222 name=Sara
2 2011 Hardware 3 X-8922 X-7644 X-6485
3 2015 Software 4 X-8494 X-4321 X-7452 X-5321 name=John
4 2018 Hardware 2 X-6901 X-1121
5 2007 Server 3 X-7655 X-6555 X-8999 name=Sara
[Desired output]
Year Problem_ID Number.of.errors Name X X.1 X.2 X.3 X.4
1 2005 Server 2 1 X-4555 X-1222 name=Sara
2 2011 Hardware 3 0 X-8922 X-7644 X-6485
3 2015 Software 4 1 X-8494 X-4321 X-7452 X-5321 name=John
4 2018 Hardware 2 0 X-6901 X-1121
5 2007 Server 3 1 X-7655 X-6555 X-8999 name=Sara
Upvotes: 4
Views: 6753
Reputation: 30
As things change, you can use dplyr
filter
and if_any
your_data %>%
filter(if_any(everything(), ~str_detect(tolower(.), "name=")))
Upvotes: 1
Reputation: 20095
Option #1:
You can use apply
as suggested by @RomanLuštrik to get name
column. One has to check if any column in that row has matching text to name=
using any
and grep
. The solution will be like:
df$name <- apply(df, 1, function(x)as.integer(any(grep("name=",x))))
df
# Year Problem_ID Number.of.errors X X.1 X.2 X.3 X.4 name
# 1 2005 Server 2 X-4555 X-1222 name=Sara <NA> <NA> 1
# 2 2011 Hardware 3 X-8922 X-7644 X-6485 <NA> <NA> 0
# 3 2015 Software 4 X-8494 X-4321 X-7452 X-5321 name=John 1
# 4 2018 Hardware 2 X-6901 X-1121 <NA> <NA> <NA> 0
# 5 2007 Server 3 X-7655 X-6555 X-8999 name=Sara <NA> 1
Option#2:
In case OP
is interested in only rows which contains name=
text then it would be user csv
line-wise and then filter out rows containing name=
.
All_lines <- readLines(file_name)
desired_lines <- grep("name=", All_lines, value = TRUE)
The desired_lines
got only those rows which contains name=
.
Data:
df <- read.table(text =
"Year Problem_ID Number.of.errors X X.1 X.2 X.3 X.4
1 2005 Server 2 X-4555 X-1222 name=Sara NA NA
2 2011 Hardware 3 X-8922 X-7644 X-6485 NA NA
3 2015 Software 4 X-8494 X-4321 X-7452 X-5321 name=John
4 2018 Hardware 2 X-6901 X-1121 NA NA NA
5 2007 Server 3 X-7655 X-6555 X-8999 name=Sara NA",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 4