amjear
amjear

Reputation: 85

Combining header (in a separate txt file) and data (in a separate csv file) in R to make a data set

I have a txt file which has a list of headers and the type of value for each column as shown below:

header.txt:

airport string  
city    string  
country string  

and the data as :

data.csv:

SYD SYD AU
CCU CCU IND
MSP MSP US

I would like to convert the data to:

airport city country
  SYD   SYD AU
  CCU   CCU IND
  MSP   MSP US 

My idea was to read the header and data file separately, take only the column names from the header file, transpose them, save it as a separate csv file. The next step was to combine the two csv files.

I have used data.table to read the file

 monthlyHeader <- fread("header.txt")

But the "airport" becomes header and "city" and " country" are the values of the column "airport"

  airport string 
1  city  string 
2  country string 

monthlyHeader <- t(monthlyHeader)

the transpose of monthlyHeader does not give the desired result.

Is there a better way to implement this in R ?

Upvotes: 1

Views: 954

Answers (2)

Uwe
Uwe

Reputation: 42544

Using data.table as requested, this can be achieved as follows

library(data.table)
header <- fread("header.txt", header = FALSE)
header
        V1     V2
1: airport string
2:    city string
3: country string
data <- fread("data.csv", col.names = header$V1)
data
   airport city country
1:     CCU  CCU     IND
2:     MSP  MSP      US

Or, in one line of code

data <- fread("data.csv", col.names = fread("header.txt", header = FALSE)$V1)

Upvotes: 1

Mike Stanley
Mike Stanley

Reputation: 1480

For importing fixed-width files, I would recommend read_fwf from the readr package, which helps you solve most common problems with parsing fixed-width files. Here is an example of parsing your files via guessing column endings via the empty columns - there are other possible strategies if this fails on your full file:

library(readr)
fwf <- "airport string  
city    string  
country string   "
fwf2 <- "SYD SYD AU
CCU CCU IND
MSP MSP US"

d1 <- read_fwf(fwf, fwf_empty(fwf, col_names = c("name", "type")))
d1
#> # A tibble: 3 x 2
#>      name   type
#>     <chr>  <chr>
#> 1 airport string
#> 2    city string
#> 3 country string

The task of using the column names from this data frame is now quite trivial just by specifying them when we import the second file:

d2 <- read_fwf(fwf2, fwf_empty(fwf2, col_names = d1$name))
d2
#> # A tibble: 3 x 3
#>   airport  city country
#>     <chr> <chr>   <chr>
#> 1     SYD   SYD      AU
#> 2     CCU   CCU     IND
#> 3     MSP   MSP      US

Upvotes: 0

Related Questions