asikhalaban
asikhalaban

Reputation: 125

I'm not able to get summary for one of the columns in r

I have a data-frame which is df.

    id       time  internet lat lng
103  1 1385913600 14.057844   1   0
247  2 1385913600 14.062213   2   0
391  3 1385913600 14.066863   3   0
535  4 1385913600 14.045190   4   0
679  5 1385913600 12.772210   5   0
823 10 1385913600  8.101804  10   0

I added a new column and put values of all of them 0 by using one of the below methods:

df["cluster"] <- 0
df$cluster <- 0

And then by using my algorithm I changed value of each df$cluster which you can see the method I used to change value of df$cluster:

clusternumber <- clusternumber + 1
df$cluster[df$id == minid] <- clusternumber

At the end I got the result I'm looking but I've faced with a new problem. When I'm trying to get summary of my result I'm getting strange result.

> summary(df)
       id           internet            lat              lng            cluster    
 Min.   :    1   Min.   :   0.00   Min.   :  1.00   Min.   :  0.00   1      : 121  
 1st Qu.: 2500   1st Qu.:  15.57   1st Qu.: 25.25   1st Qu.: 25.00   2      : 121  
 Median : 5000   Median :  36.09   Median : 51.00   Median : 49.50   3      : 121  
 Mean   : 5000   Mean   :  75.73   Mean   : 50.50   Mean   : 49.51   4      : 121  
 3rd Qu.: 7501   3rd Qu.:  78.88   3rd Qu.: 75.75   3rd Qu.: 75.00   9      : 121  
 Max.   :10000   Max.   :6663.23   Max.   :100.00   Max.   :100.00   15     : 121  
                                                                     (Other):9272    

I'm looking to know how do I have to make a new column or change value of a column because now I'm getting this:

> summary(df$cluster)
      1       2       3       4       9      15      16      17      34      52      85     147       8       6       7      36 
    121     121    other(2727)

Upvotes: 0

Views: 84

Answers (1)

www
www

Reputation: 39154

The output of your summary function clearly shows that the cluster column is factor. Below is a simple example.

# Create an example data frame
dat <- data.frame(Col_f = c("1.1", "1.1", "2.1", "2.1", "3.1", "3.1", 
                            "4.1", "4.1", "4.1"),
                  Col_n = c(1.1, 1.1, 2.1, 2.1, 3.1, 3.1, 4.1, 4.1, 4.1))

# Check the structure of the data frame
str(dat)
# 'data.frame': 9 obs. of  2 variables:
# $ Col_f: Factor w/ 4 levels "1.1","2.1","3.1",..: 1 1 2 2 3 3 4 4 4
# $ Col_n: num  1.1 1.1 2.1 2.1 3.1 3.1 4.1 4.1 4.1

# Use summary
summary(dat)
#   Col_f       Col_n      
# 1.1:2   Min.   :1.100  
# 2.1:2   1st Qu.:2.100  
# 3.1:2   Median :3.100  
# 4.1:3   Mean   :2.767  
#         3rd Qu.:4.100  
#         Max.   :4.100

Notice that in Col_f summary function simply reports the number in each level.

To convert the factor to numeric, You may want to convert the column to character first, then convert to numeric. Here is an example.

# Convert the column of factor to numeric
dat$Col_fn <- as.numeric(as.character(dat$Col_f))

Notice that Col_fn is the same as Col_n.

# Call str again
str(dat)
# 'data.frame': 9 obs. of  3 variables:
# $ Col_f : Factor w/ 4 levels "1.1","2.1","3.1",..: 1 1 2 2 3 3 4 4 4
# $ Col_n : num  1.1 1.1 2.1 2.1 3.1 3.1 4.1 4.1 4.1
# $ Col_fn: num  1.1 1.1 2.1 2.1 3.1 3.1 4.1 4.1 4.1

If you directly convert factor to numeric, it will be based on the level. Here is an example.

# Convert the column of factor to numeric
dat$Col_ff <- as.numeric(dat$Col_f)

# Use str again
str(dat)
# 'data.frame': 9 obs. of  4 variables:
# $ Col_f : Factor w/ 4 levels "1.1","2.1","3.1",..: 1 1 2 2 3 3 4 4 4
# $ Col_n : num  1.1 1.1 2.1 2.1 3.1 3.1 4.1 4.1 4.1
# $ Col_fn: num  1.1 1.1 2.1 2.1 3.1 3.1 4.1 4.1 4.1
# $ Col_ff: num  1 1 2 2 3 3 4 4 4

Notice that col_ff are integers ranging from 1 to 4 because those were the level numbers.

Upvotes: 1

Related Questions