Reputation: 83
I have a VCF files without population information. I have three test files (pop1.txt, pop2.txt and pop3.txt) containing the names of the samples. How do I combine population information to that VCF file in R or another way?
Upvotes: 1
Views: 129
Reputation: 1095
There are a few ways you can do this.
Upstream, you could have names the samples with population notations included. For example, I named the fastq/bam files in an experiment No_L_1
and No_R_1
for sample number 1 in the Norwegian lake and stream dataset I had.
Use something like sed
or awk
to loop through the population ID's and change the VCF sample column names to something more intuitive.
In R
, read the data in using a library like vcfR
, and then change the sample on the R object. I tend to read data in and convert the vcfR.object
into a data.table
. (i.e., vcf <- read.vcfR(vcf_path) %>% as.data.table)
)
Regardless, it would probably easiest to have all your population data in a single csv if you're doing analyses in R, with column 1 being sample_id
, and column 2 being population
.
Upvotes: 0