Simon
Simon

Reputation: 79

Removing numbers before

I have a dataset of a customer list. The first column of type factor (Kunden.Nr..Kurzname) has always a number (ranges from 1 to 4 digits) before the actual customer name, that I would like to remove. The data set currently looks like this:

    Kunden.Nr..Kurzname             Name..Vorname           Adresse              Postfach  PLZ
    1    1529 33ER TAXI AG              33er Taxi AG    Jägerstrasse 5                  <NA> 4016
    2      2384 4EYES GMBH                4eyes GmbH      Grubenweg 25                  <NA> 4153
    3 1548 A. SCHULMANN AG           A. Schulmann AG    Kernstrasse 10                  <NA> 8004
    4     3427 AAA DENT AG               AAA Dent AG  Die Zahnärzte.ch Centralbahnstrasse 20 4051
    5 555 AARE SEELAND MOB     Aare Seeland mobil AG   Hauptstrasse 93                  <NA> 2560
    6 856 AASTRA TELECOM S Aastra Telecom Schweiz AG Schulhausgasse 24                  <NA> 3113

And I would like to have it like this:

    Kunden.Nr..Kurzname             Name..Vorname           Adresse              Postfach  PLZ
    1    33ER TAXI AG              33er Taxi AG    Jägerstrasse 5                  <NA> 4016
    2      4EYES GMBH                4eyes GmbH      Grubenweg 25                  <NA> 4153
    3 A. SCHULMANN AG           A. Schulmann AG    Kernstrasse 10                  <NA> 8004
    4     AAA DENT AG               AAA Dent AG  Die Zahnärzte.ch Centralbahnstrasse 20 4051
    5 AARE SEELAND MOB     Aare Seeland mobil AG   Hauptstrasse 93                  <NA> 2560
    6 AASTRA TELECOM S Aastra Telecom Schweiz AG Schulhausgasse 24                  <NA> 3113

Basically, I would need to remove everything before and including the first space. Figured out that I probably have to use "gsub", but unfortunately I haven't used R for a long time. Help is highly appreciated.

Upvotes: 1

Views: 108

Answers (3)

Alexey Ferapontov
Alexey Ferapontov

Reputation: 5169

You can simply do gsub("^[0-9]{1,4}\\s","",df$Kunden.Nr..Kurzname)

Upvotes: 0

K. Rohde
K. Rohde

Reputation: 9676

All the answers before are kind of overloaded. Here is a suggestion, that is somewhat straightforward and does everything like you asked.

DF <- #your data.frame

FindFirstSpace <- regexpr(" ", DF$Kunden.Nr..Kurzname, fixed = TRUE)
DF$Kunden.Nr..Kurzname <- substr(DF$Kunden.Nr..Kurzname, FindFirstSpace + 1, 1000)

regexpr returns the first instance of " " from your character vector. Note that regexpr is made for finding expressions "like" your pattern. But fixed = TRUE makes the search specific. Then take the Substring from after the first space. For stop value you can take any number big enough.

Upvotes: 0

Konrad
Konrad

Reputation: 18585

I would like to suggest making use of groups:

gsub("^(\\d+)([[:space:]])(.+)$","\\3",x)

For example:

> x <- c("1529 33ER TAXI AG", "2384 4EYES GMBH")
> gsub("^(\\d+)([[:space:]])(.+)$","\\3",x)
[1] "33ER TAXI AG" "4EYES GMBH" 

Demos

Explanation

Courtesy of regex101.com.

How the match works

Upvotes: 1

Related Questions