Reputation: 31
I have a text file with data line items that looks like:
I'd like to delete the lines that contain the following values:(I have another text file that has these values that need to be deleted) PPP QQQ
To end up with:
I have never used R and would like to know if there is a way to have this done. If it can be done in a faster way in Python, please let me know. I am open to options.
Upvotes: 2
Views: 663
Reputation: 205
I am not familiar with R, but here's how I'd do it in python
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if not line.__contains__("string to delete"):
f.write(line)
EDIT: for this to work with reading another file with all of the strings to exclude, you'd do the following:
with open("to be deleted.txt", "r") as f:
parts = f.readlines()
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
for part in parts:
if not part in line:
f.write(line)
Upvotes: 2
Reputation: 521429
You could use a combination of readLines
and grepl
, followed by writeLines
:
conn <- file("path/to/input.txt")
lines <- readLines(conn)
close(conn)
lines <- lines[grepl("^(?!.*\\b(?:PPP|QQQ)\\b).*$", lines, perl=TRUE)]
conn <- file("path/to/input.txt", "w") # assuming you want to write to the same file
writeLines(lines, conn)
close(conn)
Upvotes: 3
Reputation: 61164
You can use grep
for integer indexing
> df[-grep("PPP|QQQ", df$V1), , drop=FALSE]
V1
1 1~123~JJJ
2 2~223~AAA
3 3~444~LLL
Where df
is a data.frame:
df <- read.table(text="1~123~JJJ
2~223~AAA
3~444~LLL
4~567~PPP
5~785~QQQ", header=FALSE, stringsAsFactors=FALSE)
Upvotes: 2