noobmaster69
noobmaster69

Reputation: 55

How to produce scatterplot of .txt file in R

I am currently trying to produce a scatterplot of a .txt file that is structured like this in 25 rows:

age income weight

33       63      180

25       72      220 

however, when I try to convert it to a csv and then produce a scatterplot with the following code:

my_input <- read.csv2('dataInput.txt', sep = '\t', header = T)

plot(x = my_input$ageX, y = my_input$weightY)

I get an error message. I also notice that there is now a period between 'age' 'income' and 'weight', which I don't understand since I would expect to get a comma between them. the error message is as follows:

Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf

Any ideas on how to actually get a scatterplot of the data?

Edit: executing

head(my_input)

age. income. weight
1  56     63     185
2  38     72     156
3  28     75     178
4  49     59     205
5  69     65     235
6  19     70     195

Edit:

str(my_input)

age.income.weight: Factor w/ 18 levels "56  63     185",..: 1 2 3 4 5 6 7 8 9 10 ...
summary(my_input)
age.income.weight

 56     63     185: 1     
 38     72     156: 1     
 28     75     178: 1     
 49     59     205: 1     
 69     65     235: 1     
 19     70     195: 1     
 (Other)          :19     

Upvotes: 1

Views: 2221

Answers (1)

dc37
dc37

Reputation: 16178

Based on your edits in your question, you have an issue in the loading of your txt file. While checking the structure of your text file, it appears that there is no consistent spacing between each row and columns.

So, one way to get it to work is to create the dataframe from scratch by read it using readLines:

my_input <- readLines("crime_input.txt")
my_input <- unlist(strsplit(my_input," "))

Now you see that the file contains a lot of space:

> my_input
  [1] "age"    "income" "crimes" "16"     ""       ""       ""       ""       "63"     ""       ""       ""      
 [13] ""       "23"     "18"     ""       ""       ""       ""       "72"     ""       ""       ""       ""      
 [25] "25"     "18"     ""       ""       ""       ""       "75"     ""       ""       ""       ""       "22"    
 [37] "19"     ""       ""       ""       ""       "59"     ""       ""       ""       ""       "16"     "19"    
 [49] ""       ""       ""       ""       "65"     ""       ""       ""       ""       "19"     "19"     ""      
 [61] ""       ""       ""       "70"     ""       ""       ""       ""       "19"     "20"     ""       ""      
 [73] ""       ""       "78"     ""       ""       ""       ""       "18"     "21"     ""       ""       ""      
 [85] ""       "35"     ""       ""       ""       ""       "11"     "21"     ""       ""       ""       ""      
 [97] "53"     ""       ""       ""       ""       "15"     "23"     ""       ""       ""       ""       "28"    
[109] ""       ""       ""       ""       ""       "9"      "27"     ""       ""       ""       ""       "56"    
[121] ""       ""       ""       ""       "16"     "28"     ""       ""       ""       ""       "52"     ""      
[133] ""       ""       ""       "14"     "29"     ""       ""       ""       ""       "63"     ""       ""      
[145] ""       ""       "25"     "30"     ""       ""       ""       ""       "46"     ""       ""       ""      
[157] ""       "17"     "30"     ""       ""       ""       ""       "55"     ""       ""       ""       ""      
[169] "19"     "31"     ""       ""       ""       ""       "29"     ""       ""       ""       ""       ""      
[181] "8"      "32"     ""       ""       ""       ""       "55"     ""       ""       ""       ""       "22"    
[193] "32"     ""       ""       ""       ""       "62"     ""       ""       ""       ""       "25"    

So, we can convert everything to numeric, remove NA and get:

my_input <- as.numeric(my_input)
my_input <- my_input[!is.na(my_input)]

To get:

> my_input
 [1] 16 63 23 18 72 25 18 75 22 19 59 16 19 65 19 19 70 19 20 78 18 21 35 11 21 53 15 23 28  9 27 56 16 28 52 14
[37] 29 63 25 30 46 17 30 55 19 31 29  8 32 55 22 32 62 25

Finally, we can fill a matrix with this vector:

my_input <- matrix(my_input, nrow = 3, ncol = length(my_input)/3)
> my_input
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
[1,]   16   18   18   19   19   19   20   21   21    23    27    28    29    30    30    31    32    32
[2,]   63   72   75   59   65   70   78   35   53    28    56    52    63    46    55    29    55    62
[3,]   23   25   22   16   19   19   18   11   15     9    16    14    25    17    19     8    22    25

Now, we can transpose the matrix, transform as a data.frame and add colnames:

my_input <- as.data.frame(t(my_input))
colnames(my_input) <- c("age","income","crimes")

And finally, you get:

> head(my_input)
   age income crimes
1   16     63     23
2   18     72     25
3   18     75     22
4   19     59     16
5   19     65     19
6   19     70     19

And if you check the format of my_input:

> str(my_input)
'data.frame':   18 obs. of  3 variables:
 $ age   : num  16 18 18 19 19 19 20 21 21 23 ...
 $ income: num  63 72 75 59 65 70 78 35 53 28 ...
 $ crimes: num  23 25 22 16 19 19 18 11 15 9 ...

So, now, you can plot it:

my_input = my_input[order(my_input$age),]
plot(x = my_input$age, y = my_input$crimes, type = "b")

enter image description here

Now, you can work with this file. Hope it helps you to solve this issue.

Upvotes: 1

Related Questions