Reactormonk
Reactormonk

Reputation: 21690

R merge with itself

Can I merge data like

name,#797,"Stachy, Poland"
at_rank,#797,1
to_center,#797,4.70
predicted,#797,4.70

According to the second column and take the first column as column names?

     name             at_rank to_center predicted
#797 "Stachy, Poland" 1       4.70      4.70

Upon request, the whole set of data: http://sprunge.us/cYSJ

Upvotes: 4

Views: 534

Answers (3)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

The first problem, of reading the data in, should not be a problem if your strings with commas are quoted (which they seem to be). Using read.csv with the header=FALSE argument does the trick with the data you shared. (Of course, if the data file had headers, delete that argument.)

From there, you have several options. Here are two.

  1. reshape (base R) works fine for this:

    myDF <- read.csv("http://sprunge.us/cYSJ", header=FALSE)
    myDF2 <- reshape(myDF, direction="wide", idvar="V2", timevar="V1")
    head(myDF2)
    #    V2                V3.name V3.at_rank V3.to_center V3.predicted
    # 1  #1                Kitoman          1         2.41         2.41
    # 5  #2                Hosaena          2         4.23         9.25
    # 9  #3 Vinzelles, Puy-de-Dôme          1         5.20         5.20
    # 13 #4     Whitelee Wind Farm          6         3.29         8.07
    # 17 #5    Steveville, Alberta          1         9.59         9.59
    # 21 #6        Rocher, Ardèche          1         0.13         0.13
    
  2. The reshape2 package is also useful in these cases. It has simpler syntax and the output is also a little "cleaner" (at least in terms of variable names).

    library(reshape2)
    myDFw_2 <- dcast(myDF, V2 ~ V1)
    # Using V3 as value column: use value.var to override.
    head(myDFw_2)
    #       V2 at_rank                                       name predicted to_center
    # 1     #1       1                                    Kitoman      2.41      2.41
    # 2    #10       4                            Icaraí de Minas      6.07      8.19
    # 3   #100       2        Scranton High School (Pennsylvania)      5.78      7.63
    # 4  #1000       1                  Bat & Ball Inn, Clanfield      2.17      2.17
    # 5 #10000       3                                     Tăuteu      1.87      5.87
    # 6 #10001       1 Oak Grove, Northumberland County, Virginia      5.84      5.84
    

Upvotes: 2

frankc
frankc

Reputation: 11473

I think in this case all you really need to do is transpose, cast to data.frame, set the colnames to the first row and then remove the first row. It might be possible to skip the last step through some combination of arguments to data.frame but I don't know what they are right now.

Upvotes: 0

Btibert3
Btibert3

Reputation: 40146

Look at the reshape package from Hadley. If I understand correctly, you are just pivoting your data from long to wide.

Upvotes: 1

Related Questions