Reputation: 79
I have a dataset of a customer list. The first column of type factor (Kunden.Nr..Kurzname) has always a number (ranges from 1 to 4 digits) before the actual customer name, that I would like to remove. The data set currently looks like this:
Kunden.Nr..Kurzname Name..Vorname Adresse Postfach PLZ
1 1529 33ER TAXI AG 33er Taxi AG Jägerstrasse 5 <NA> 4016
2 2384 4EYES GMBH 4eyes GmbH Grubenweg 25 <NA> 4153
3 1548 A. SCHULMANN AG A. Schulmann AG Kernstrasse 10 <NA> 8004
4 3427 AAA DENT AG AAA Dent AG Die Zahnärzte.ch Centralbahnstrasse 20 4051
5 555 AARE SEELAND MOB Aare Seeland mobil AG Hauptstrasse 93 <NA> 2560
6 856 AASTRA TELECOM S Aastra Telecom Schweiz AG Schulhausgasse 24 <NA> 3113
And I would like to have it like this:
Kunden.Nr..Kurzname Name..Vorname Adresse Postfach PLZ
1 33ER TAXI AG 33er Taxi AG Jägerstrasse 5 <NA> 4016
2 4EYES GMBH 4eyes GmbH Grubenweg 25 <NA> 4153
3 A. SCHULMANN AG A. Schulmann AG Kernstrasse 10 <NA> 8004
4 AAA DENT AG AAA Dent AG Die Zahnärzte.ch Centralbahnstrasse 20 4051
5 AARE SEELAND MOB Aare Seeland mobil AG Hauptstrasse 93 <NA> 2560
6 AASTRA TELECOM S Aastra Telecom Schweiz AG Schulhausgasse 24 <NA> 3113
Basically, I would need to remove everything before and including the first space. Figured out that I probably have to use "gsub", but unfortunately I haven't used R for a long time. Help is highly appreciated.
Upvotes: 1
Views: 108
Reputation: 5169
You can simply do gsub("^[0-9]{1,4}\\s","",df$Kunden.Nr..Kurzname)
Upvotes: 0
Reputation: 9676
All the answers before are kind of overloaded. Here is a suggestion, that is somewhat straightforward and does everything like you asked.
DF <- #your data.frame
FindFirstSpace <- regexpr(" ", DF$Kunden.Nr..Kurzname, fixed = TRUE)
DF$Kunden.Nr..Kurzname <- substr(DF$Kunden.Nr..Kurzname, FindFirstSpace + 1, 1000)
regexpr
returns the first instance of " " from your character vector. Note that regexpr
is made for finding expressions "like" your pattern. But fixed = TRUE
makes the search specific.
Then take the Substring from after the first space. For stop
value you can take any number big enough.
Upvotes: 0
Reputation: 18585
I would like to suggest making use of groups:
gsub("^(\\d+)([[:space:]])(.+)$","\\3",x)
For example:
> x <- c("1529 33ER TAXI AG", "2384 4EYES GMBH")
> gsub("^(\\d+)([[:space:]])(.+)$","\\3",x)
[1] "33ER TAXI AG" "4EYES GMBH"
Courtesy of regex101.com.
Upvotes: 1