Reputation: 243
processor transistor_count doi designer process area
1 Intel 4004 2,300 1971 Intel 10,000 nm 12 mm²
2 Intel 8008 3,500 1972 Intel 10,000 nm 14 mm²
3 Intel 8080 4,500 1974 Intel 6,000 nm 20 mm²
4 Motorola 6800 4,100 1974 Motorola 6,000 nm 16 mm²
5 RCA 1802 5,000 1974 RCA 5,000 nm 27 mm²
6 TMS 1000 8,000 1974[7] Texas Instruments 8,000 nm <NA>
7 MOS Technology 6502 3,510[8] 1975 MOS Technology 8,000 nm 21 mm²
8 Intel 8085 6,500 1976 Intel 3,000 nm 20 mm²
9 Zilog Z80 8,500 1976 Zilog 4,000 nm 18 mm²
10 Intel 8086 29,000 1978 Intel 3,000 nm 33 mm²
11 Motorola 6809 9,000 1978 Motorola 5,000 nm 21 mm²
12 Intel 8088 29,000 1979 Intel 3,000 nm 33 mm²
13 Motorola 68000 68,000 1979 Motorola 3,500 nm 44 mm²
14 WDC 65C02 11,500[9] 1981 WDC 3,000 nm 6 mm²
Hello my friends I am trying to get rid of some characters in the column "transistor_count
" and "doi
", as you can see there is these "[x]
" that prevent me to make that vector a "numeric
" vector an operate over it. also, I have identifies things like "~
" and some other characters in that vector. How can you eliminate this "[8]
" without eliminating an "8
" from the numbers I want to use. Also, is there a way to check how many of those problematic characters are and what they look like?
I know I can use gsub
for this and replace the problematic characters that I have spotted that way, but if the data is too big to check one by one? I have tried to use check.character()
but it didn't even run.
Upvotes: 1
Views: 38
Reputation: 887158
Loop through the columns of interest, use sub
to match the pattern of [
followed by one or more digits then a closing bracket ]
, replace it with blank (""
)
df[c("transistor_count", "doi")] <- lapply(df[c("transistor_count", "doi")],
function(x) sub("\\[\\d+\\]", "", x))
Upvotes: 1