Marco ViCo
Marco ViCo

Reputation: 51

Create species matrix with abundance (counts)

I have a dataset (as.data.frame) like this one:

Site Species Count
a Abies 14
b Alnus 1
c Pinus 1
c Artem 2
n ... ...

, n of sites = 26000. I need to convert it into a matrix like this one in R:

Abies Alnus Pinus Artem
a 14 0 0 0
b 0 1 0 0
c 0 0 1 2
n ... ... ... ...

I came across the 'fossil' package, with the create.matrix fuction. This function creates the matrix I need but only with the presence (1) or absence (0) of each species for each site. However, I need the abundance (count), not the presence-absence (1-0).

Upvotes: 1

Views: 2415

Answers (2)

I hope I'm not too late to answer your question.

If you type ?create.matrix in the RStudio console you can get the documentation about the function. There it's said that you can actually use your original raw data to make an abundance matrix, but you have to include a couple of extra arguments(tax.name to indicate the species names, locality to indicate the sites, abund.col to indicate the count of each species and abund = TRUE just to let the function know we're working with abundance data).

In your case...

df <- create.matrix(x, tax.name = "Species",
   locality = "Site",
   abund.col = "Count",
   abund = TRUE)

Where x is the name of your data.frame containg those three columns (Site, Species and Count). However, this will create a data.frame where the rows are the species and the columns are the sites. If you want to transpose it, just use the function t(df) to change the species to the columns and the sites to the rows!

Hope this was helpful, also you can check the rest of the documentation right here.

Also it is important to know that the output of the function create.matrix is not a data.frame, so you might want to convert it to a data frame using as.data.frame while doing the transposition...

abundance.matrix <- as.data.frame(t(df))

Upvotes: 2

Vijayalakshmi Ramesh
Vijayalakshmi Ramesh

Reputation: 52

import pandas as pd
import numpy as np

df1=pd.DataFrame([14],index=['A'],columns=['Abies'])
df2=pd.DataFrame([1],index=['B'],columns=['Alnus'])
df3=pd.DataFrame([1],index=['C'],columns=['Pinus'])
df4=pd.DataFrame([2],index=['C'],columns=['Artem'])

A_B=pd.merge(df1, df2, how='outer', left_index=True, right_index=True)
C_C=pd.merge(df3, df4, how='outer', left_index=True, right_index=True)
new=pd.merge(A_B, C_C, how='outer', left_index=True, right_index=True)

new=new.replace(np.nan,0)
new=new.astype(int)
new
  • Import two libraries: numpy and pandas

  • Create data frame for each count and make index as 'Site' and column as 'Species'

  • Merge all those data frames

  • Replace NaN values with 0

  • Convert 'float' to 'int'

Upvotes: -1

Related Questions