Reputation: 51
I have a dataset (as.data.frame) like this one:
Site | Species | Count |
---|---|---|
a | Abies | 14 |
b | Alnus | 1 |
c | Pinus | 1 |
c | Artem | 2 |
n | ... | ... |
, n of sites = 26000. I need to convert it into a matrix like this one in R:
Abies | Alnus | Pinus | Artem | |
---|---|---|---|---|
a | 14 | 0 | 0 | 0 |
b | 0 | 1 | 0 | 0 |
c | 0 | 0 | 1 | 2 |
n | ... | ... | ... | ... |
I came across the 'fossil' package, with the create.matrix fuction. This function creates the matrix I need but only with the presence (1) or absence (0) of each species for each site. However, I need the abundance (count), not the presence-absence (1-0).
Upvotes: 1
Views: 2415
Reputation: 36
I hope I'm not too late to answer your question.
If you type ?create.matrix
in the RStudio console you can get the documentation about the function. There it's said that you can actually use your original raw data to make an abundance matrix, but you have to include a couple of extra arguments(tax.name
to indicate the species names, locality
to indicate the sites, abund.col
to indicate the count of each species and abund = TRUE
just to let the function know we're working with abundance data).
In your case...
df <- create.matrix(x, tax.name = "Species",
locality = "Site",
abund.col = "Count",
abund = TRUE)
Where x
is the name of your data.frame containg those three columns (Site, Species and Count). However, this will create a data.frame where the rows are the species and the columns are the sites. If you want to transpose it, just use the function t(df)
to change the species to the columns and the sites to the rows!
Hope this was helpful, also you can check the rest of the documentation right here.
Also it is important to know that the output of the function create.matrix is not a data.frame, so you might want to convert it to a data frame using as.data.frame
while doing the transposition...
abundance.matrix <- as.data.frame(t(df))
Upvotes: 2
Reputation: 52
import pandas as pd
import numpy as np
df1=pd.DataFrame([14],index=['A'],columns=['Abies'])
df2=pd.DataFrame([1],index=['B'],columns=['Alnus'])
df3=pd.DataFrame([1],index=['C'],columns=['Pinus'])
df4=pd.DataFrame([2],index=['C'],columns=['Artem'])
A_B=pd.merge(df1, df2, how='outer', left_index=True, right_index=True)
C_C=pd.merge(df3, df4, how='outer', left_index=True, right_index=True)
new=pd.merge(A_B, C_C, how='outer', left_index=True, right_index=True)
new=new.replace(np.nan,0)
new=new.astype(int)
new
Import two libraries: numpy
and pandas
Create data frame for each count and make index as 'Site' and column as 'Species'
Merge all those data frames
Replace NaN values with 0
Convert 'float' to 'int'
Upvotes: -1