Reputation: 3
My data contains x axis points and y value for each x axis point. The x axis points are not evenly distributed. I need to visualize how the x axis points are clustered and how does the y value appears for such clusters. To see how the x values are clustered I can plot density plot on x value, however it does not reflect the y values at that cluster. for example- if 100 points (lets say) on x axis are very close to each other and all has positive y value I want my plot go up at that point, if those 100 points has negative y value I want my plot go down the zero line in plot, if those 100 points has both positive and negative y values I want my plot be around zero point. Similarly, even if the those 100 points all has positive value, if they are scattered along long distance I want the plot be near the zero line.
In short, density of x points and its y value both matters to me and I want to plot smooth line. Could anyone help me with this?(stat_smooth
did not do the work as it makes my plot almost straight line)
here are my x and y axis values (I did not know how to insert table here)
x axis values
x_value
86645
87018
987522
989433
989934
991055
995476
9987548
9987885
9988511
9988522
9991975
9992246
9992428
9993646
9993668
9994285
9994309
9994317
9994425
9994437
9994581
9994856
9994878
9995045
9995072
9995103
9995142
9995153
9995521
9996329
9996568
9997122
9997269
9997277
9997282
9998216
9999596
9999838
10001799
10004506
10007993
10008597
10009002
10009022
10009225
10009530
10009657
10010526
10012288
10012897
10012899
10012901
10014614
10014903
10015001
10015039
10015059
10015340
10015342
10016761
10018152
10020062
10024053
10024058
10024284
10024318
10025853
10026758
10028903
10029674
10029835
10030862
10031185
10031737
10033603
10035054
10035100
10036294
10036678
10036691
10036698
10036783
10037234
10037289
10037388
10039332
10039431
10042426
10042469
10042471
10043156
10043218
10043225
10045396
10045986
10046533
10046604
10047066
10047179
10047865
10048106
10048136
10048873
10049328
10049724
10049961
10049974
10050014
10050020
10050039
10050041
10050450
10050451
10050558
10050561
10051330
10051336
10052228
Y axix values:
y_value
16.7
14.3
10.5
18.2
20.0
16.7
14.3
10.4
27.3
22.2
11.1
-18.2
-10.1
-13.3
-26.4
-13.3
-15.4
14.3
15.4
11.7
26.7
18.2
64.7
21.2
20.0
11.8
-17.9
25.0
14.2
20.0
18.2
12.5
12.5
10.5
11.1
12.5
14.3
-20.0
12.5
-20.0
16.7
13.3
18.2
20.0
30.0
20.0
11.8
-18.8
20.0
20.0
12.5
18.8
13.3
-15.4
18.2
18.9
28.6
20.0
12.5
16.1
15.4
10.5
13.3
29.7
23.1
18.2
14.3
12.5
12.5
16.7
11.1
20.0
18.2
18.2
13.2
13.3
11.8
15.4
14.3
23.8
18.2
33.3
18.2
-12.5
12.5
23.1
21.7
14.3
16.7
11.1
16.7
12.5
11.1
12.5
18.2
12.5
11.0
20.0
18.2
15.8
10.5
10.2
10.5
14.3
11.8
25.0
13.8
16.4
16.7
-18.2
18.2
16.7
18.2
18.2
11.8
12.5
14.3
17.9
10.5
Upvotes: 0
Views: 504
Reputation: 59415
Note: In what follows, I've combined your x and y data into a data frame df
with columns x
and y
.
Looking at a simple scatter plot, it appears that your data is grouped more or less into five clusters:
with(df,plot(x,y))
To see the distribution in both the x and y-direction you need a 2-dimensional kernal density estimate, which is available in package MASS
. You can then plot this in 3 dimensions (with the density as z
) using the rgl
package.
library(MASS) # for kde2d(...)
library(rgl) # for open3d(...) and surface3d(...)
dens <- kde2d(df$x,df$y)
zlim <- range(dens$z)
palette <- rev(heat.colors(10))
col <- palette[9*(dens$z-zlim[1])/diff(zlim) + 1] # assign colors to heights for each point
with(dens,open3d(scale=c(x=1/diff(range(x)),y=1/diff(range(y)),z=1/diff(range(z)))))
with(dens,surface3d(x,y,z, color=col))
title3d(xlab="X",ylab="Y")
Upvotes: 1