Sabor117
Sabor117

Reputation: 135

Using select() in R giving "Error inputs must resolve to integer column positions"

I have a script which gradually adds a number of columns to an existing data frame (df1) and then from this will then take a subset of these columns and output it as df2, whilst renaming the columns at the same time.

I've previously used the select() function in dplyr to do this and it has actually worked previously on similar datasets, so I'm a bit stumped as to why it's not working all of a sudden now. I've seen a few other threads about using select() but none of them really helped with my question.

Here is the column list and first line of the data I am using:

gene_id variant_id tss_distance ma_samples ma_count maf pval_nominal slope slope_se rsid chr pos ref_allele alt gene_id_new gene_name info
ENSG00000227232.4 1_13417_C_CGAGA_b37       -16136         50       50 0.07225430   0.00908288  0.3556660 0.1354910 rs777038595   1 13417          C CGAGA ENSG00000227232    WASH7P    1

Here is the code for my selection:

parsed_columns = select(df1, chr = "chr",
                    pos = "pos",
                    ref = "ref_allele",
                    alt = "alt",
                    reffrq = "maf",
                    info = "info",
                    rs = "rsid",
                    pval = "pval_nominal",
                    effalt = "slope",
                    gene = "gene_name")

And from this I get an error saying that all of the names in the quotations do not resolve to integer positions.

I initially thought I might just have the names on the wrong side of the function (so, for example, it should be rsid = "rs") but then you have columns where it is the same on both sides (e.g. pos = "pos") and supposedly that isn't present either. So I'm a bit stuck. Any help would be appreciated.

Upvotes: 1

Views: 551

Answers (1)

Robele Baker
Robele Baker

Reputation: 91

With dplyr, do you need to have your column names in quotations. Simply adding the column name of the referenced data frame should suffice.

More generically,

df2 = select(df1,
             col1name = col1
             col2name = col2
             ...
             )

Provided that the col1, col2, etc. are valid column names in df1.

Give this a try for your R code

parsed_columns = select(df1, chr = chr,
                    pos = pos,
                    ref = ref_allele,
                    alt = alt,
                    reffrq = maf,
                    info = info,
                    rs = rsid,
                    pval = pval_nominal,
                    effalt = slope,
                    gene = gene_name)

Upvotes: 1

Related Questions