Felipe Araya Olea
Felipe Araya Olea

Reputation: 243

Getting rid of unwanted characters in a numeric vector

                 processor transistor_count     doi          designer   process    area
1        Intel 4004            2,300    1971             Intel 10,000 nm 12 mm²
2           Intel 8008            3,500    1972             Intel 10,000 nm 14 mm²
3           Intel 8080            4,500    1974             Intel  6,000 nm 20 mm²
4        Motorola 6800            4,100    1974          Motorola  6,000 nm 16 mm²
5             RCA 1802            5,000    1974               RCA  5,000 nm 27 mm²
6             TMS 1000            8,000 1974[7] Texas Instruments  8,000 nm    <NA>
7  MOS Technology 6502         3,510[8]    1975    MOS Technology  8,000 nm 21 mm²
8           Intel 8085            6,500    1976             Intel  3,000 nm 20 mm²
9            Zilog Z80            8,500    1976             Zilog  4,000 nm 18 mm²
10          Intel 8086           29,000    1978             Intel  3,000 nm 33 mm²
11       Motorola 6809            9,000    1978          Motorola  5,000 nm 21 mm²
12          Intel 8088           29,000    1979             Intel  3,000 nm 33 mm²
13      Motorola 68000           68,000    1979          Motorola  3,500 nm 44 mm²
14           WDC 65C02        11,500[9]    1981               WDC  3,000 nm  6 mm²

Hello my friends I am trying to get rid of some characters in the column "transistor_count" and "doi", as you can see there is these "[x]" that prevent me to make that vector a "numeric" vector an operate over it. also, I have identifies things like "~" and some other characters in that vector. How can you eliminate this "[8]" without eliminating an "8" from the numbers I want to use. Also, is there a way to check how many of those problematic characters are and what they look like?

I know I can use gsub for this and replace the problematic characters that I have spotted that way, but if the data is too big to check one by one? I have tried to use check.character() but it didn't even run.

Upvotes: 1

Views: 38

Answers (1)

akrun
akrun

Reputation: 887158

Loop through the columns of interest, use sub to match the pattern of [ followed by one or more digits then a closing bracket ], replace it with blank ("")

df[c("transistor_count", "doi")] <- lapply(df[c("transistor_count", "doi")], 
                 function(x) sub("\\[\\d+\\]", "", x))

Upvotes: 1

Related Questions