Title: | Extracting and Visualizing Paleobiodiversity |
---|---|
Description: | Contains various tools for conveniently downloading and editing taxon-specific datasets from the Paleobiology Database <https://paleobiodb.org>, extracting information on abundance, temporal distribution of subtaxa and taxonomic diversity through deep time, and visualizing these data in relation to phylogeny and stratigraphy. |
Authors: | Darius Nau [aut, cre] |
Maintainer: | Darius Nau <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.4.0 |
Built: | 2025-02-15 05:25:26 UTC |
Source: | https://github.com/cran/paleoDiv |
Make a data.frame() that can be used to plot diversity data with density plots, e.g. in ggplot2
ab.gg(data, taxa = NULL, agerange = c(252, 66), precision_ma = 1)
ab.gg(data, taxa = NULL, agerange = c(252, 66), precision_ma = 1)
data |
list()-object containing occurrence data.frames or single occurrence data.frame() |
taxa |
Selection of taxa to include. If NULL, then abundance is tabulated for each unique factor level of data$tna |
agerange |
Range of geological ages to include in data.frame() |
precision_ma |
Size of intervals (in ma) at which to calculate diversity within the age range. |
Each taxon receives one entry per occurrence per time interval. The number of entries per taxon at any given point is thus proportional to the abundance of the taxon in the fossil record, and can be used for plotting with frequency- or density-based functions (e.g. hist(), ggplot2::geom_violin(), etc.). Note that using age values in the original occurrence table instead of this function will often be fully sufficient if the number of occurrences is considered an adequate proxy for abundance. However, instead using the ab.gg() and thus visualizing the results of the abdistr_() function has the benefit of the ability to account for a column of abundance values within the occurrence dataset, if available.
A data.frame() with two columns: ma, for the numerical age, and tax, for the taxon.
data(archosauria) ab.gg(data=archosauria, taxa=c("Ankylosauria","Stegosauria"))->thyreophora library(ggplot2) ggplot(data=thyreophora, aes(x=tax, y=ma, col=tax))+ylim(252,0)+geom_violin(scale="count")
data(archosauria) ab.gg(data=archosauria, taxa=c("Ankylosauria","Stegosauria"))->thyreophora library(ggplot2) ggplot(data=thyreophora, aes(x=tax, y=ma, col=tax))+ylim(252,0)+geom_violin(scale="count")
Count number of entries in occurrence or collection data.frame for specific points in geological time
abdistr_( x, table = NULL, ab.val = table$abund_value, ab.val.na = 1, smooth = 0, max = table$eag, min = table$lag, w = rep(1, length(x)) )
abdistr_( x, table = NULL, ab.val = table$abund_value, ab.val.na = 1, smooth = 0, max = table$eag, min = table$lag, w = rep(1, length(x)) )
x |
A numeric vector giving the times (in ma) at which to determine the number of overlapping records. |
table |
An occurrence or collection dataset |
ab.val |
Abundance value to be used. Default is table$abund_value. If set to 1, each occurrence is treated as representing one specimen. If NULL (e.g. because this column does not exist) or NA, each occurrence is treated as the number of specimens specified under ab.val.na |
ab.val.na |
Value to substitute for missing entries in abundance values. Defaults to 1. Either a single numeric or a function to be applied to all non-missing entries of ab.val (e.g. mean() or median()). |
smooth |
The smoothing margin, in units of ma. Corresponds to the plusminus parameter of rmeana(). Defaults to 0, i.e. no smoothing (beyond the resolution determined by the resolution of x) |
max |
Vector or column containing maximum age of each occurrence or collection |
min |
Vector or column containing minimum age of each occurrence or collection |
w |
A Vector of weights. Must be of same length as x |
A numeric vector of the same length as x, giving the estimated number of occurrence records (if ab.val==FALSE) or specimens (if ab.val==TRUE), or the estimated number of collections (if collection data are used instead of occurrences) overlapping each temporal value given in x
data(archosauria) abdistr_(x=c(170:120), table=archosauria$Stegosauria)
data(archosauria) abdistr_(x=c(170:120), table=archosauria$Stegosauria)
Add transparency to any color
add.alpha(col, alpha = 0.5)
add.alpha(col, alpha = 0.5)
col |
Color value or vector of colors |
alpha |
Opacity value to apply to the color(s) |
A character vector containing color hex codes.
add.alpha("red",0.8)
add.alpha("red",0.8)
A dataset containing earliest and latest occurrence dates for clades shown in the example phylogeny.
ages_archosauria
ages_archosauria
A matrix with 13 rows and 2 collumns containing:
Earliest occurrence age
Latest occurrence age
…for each taxon
A dataset of stratigraphic ranges of species within the clades in tree_archosauria.
archosauria
archosauria
A list() object containing 2 occurrence data.frames, 1 collections data.frame and 15 species-range tables (all as data.frames) with the following data in each:
taxon names (species names)
maximum ages
minimum ages
mean ages
Generated from data downloaded from the paleobiology database https://paleobiodb.org using the functions pdb(), occ.cleanup() and mk.sptab()
Convert geological ages in taxon-range tables as constructed by mk.sptab() for plotting alongside a time-calibrated phylogeny.
convert.sptab(sptab, tree = NULL, root.time = tree$root.time)
convert.sptab(sptab, tree = NULL, root.time = tree$root.time)
sptab |
Taxon-range table to convert |
tree |
Optional phylogenetic tree to draw root.time from |
root.time |
Root time of the tree, used for converting ages |
A data.frame() object in the format of the original taxon-range table, but with geological ages converted for plotting alongside the the phylogenetic tree.
data(archosauria) data(tree_archosauria) convert.sptab(archosauria$sptab_Coelophysoidea,tree_archosauria)
data(archosauria) data(tree_archosauria) convert.sptab(archosauria$sptab_Coelophysoidea,tree_archosauria)
Darken or lighten colors by adding/subtracting to or hsv channel values
darken(x, add = 0, abs = NULL)
darken(x, add = 0, abs = NULL)
x |
Color value or vector of colors |
add |
Value to be added to the third hsv-channel. Can be a vector of length x, or a vector of any length if length(x)==1 |
abs |
Value to substitute for the third hsv-channel. If set, this overrides the setting for parameter add. Can be a vector of length x, or a vector of any length if length(x)==1 |
A color value or vector of color values of length x (or, if length(x)==1, the length of add or abs)
darken(ggcol(3),abs=0.5)
darken(ggcol(3),abs=0.5)
Make a data.frame() that can be used to plot diversity data with density plots, e.g. in ggplot2
div.gg(data, taxa, agerange = c(252, 66), precision_ma = 1, prefix = "sptab_")
div.gg(data, taxa, agerange = c(252, 66), precision_ma = 1, prefix = "sptab_")
data |
list()-object containing taxon-range tables |
taxa |
Selection of taxa to include |
agerange |
Range of geological ages to include in data.frame() |
precision_ma |
Size of intervals (in ma) at which to calculate diversity within the age range. |
prefix |
Prefix under which to find taxon-range tables in data |
Each taxon receives one entry per subtaxon (e.g. species) occurring for each time interval at which it occurs. The number of entries per taxon at any given point is thus proportional to the diversity of the taxon, and can be used to trick density functions (e.g. hist(), density()) into plotting diversity diagrams of various types. This is most useful when using ggplot2::geom_violin(), geom_histogram() or geom_density() functions. A simpler alternative to achieve a similar result would be to use the taxon-range-tables directly with these functions. However, this will lead to a relative underestimate of diversity for taxa with long-lived subtaxa, since each subtaxon will only be counted once. The div.gg()-function circumvents this problem by representing each taxon for each time interval in which it occurs, i.e. the relative number of entries in the returned data.frame will be proportional to the relative number of taxa with ranges overlapping each point in time.
A data.frame() with two columns: ma, for the numerical age, and tax, for the taxon.
data(archosauria) div.gg(archosauria, taxa=c("Pterosauria","Aves"), agerange=c(252,0),precision_ma=1)->flyers library(ggplot2) ggplot(data=flyers, aes(x=tax, y=ma))+ylim(252,0)+geom_violin(scale="count") ggplot(data=flyers, aes(col=tax, x=ma))+xlim(252,0)+geom_density(adjust=0.5)
data(archosauria) div.gg(archosauria, taxa=c("Pterosauria","Aves"), agerange=c(252,0),precision_ma=1)->flyers library(ggplot2) ggplot(data=flyers, aes(x=tax, y=ma))+ylim(252,0)+geom_violin(scale="count") ggplot(data=flyers, aes(col=tax, x=ma))+xlim(252,0)+geom_density(adjust=0.5)
Calculate total species diversity for any point in time based on a taxon-range table
divdistr_( x, table = NULL, w = rep(1, length(x)), smooth = 0, max = table$max, min = table$min )
divdistr_( x, table = NULL, w = rep(1, length(x)), smooth = 0, max = table$max, min = table$min )
x |
A point in time or vector of points in time, in ma, at which species diversity is to be determined. |
table |
A taxon-range table to be used, usually the output of mk.sptab() |
w |
A vector of weights to apply to the estimated (raw) diversity figures. This vector needs to be of the same length as x. Each raw diversity estimate will then be multiplied by the weight. Can be used to account for differences in collection intensity/sampling biases, if these can be quantified (e.g. by analyzing collection records. |
smooth |
The smoothing margin, in units of ma. Corresponds to the plusminus parameter of rmeana(). Defaults to 0, i.e. no smoothing (beyond the resolution determined by the resolution of x) |
max |
Vector or column containing the maximum age of each entry in the taxon-range table. Defaults to table$max |
min |
Vector or column containing the minimum age of each entry in the taxon-range table. Defaults to table$min |
divdistr_() produces a "maximum" estimate of taxonomic diversity at any given point in time in the fossil record. This function is based on the principle of counting the number of taxon ranges (from the provided range table) that overlap each age provided in x. As a result of uncertainty of age estimates, this may lead to an overestimation of the actual fossil diversity at each point in time, especially at the points of overlap between taxon-specific ranges. Moreover this represents a "raw", uncorrected diversity estimate that does not account for differences in sampling intensity throughout the time interval that is investigated. A rudimentary functionality for using such a correction exists in the form of the w argument, which allows the user to provide a vector of weights (of the same length as x) to be multiplied with the raw diversity estimates. Such weights can, for instance, be based on (the inverse of) the number of collections overlapping any given age in x, which can be calculated using the same basic approach as the raw diversity, by downloading collections instead of occurrence data.
A numeric vector containing taxon diversity (at the chosen taxonomic level used in the generation of the range table) at the provided ages.
data(archosauria) divdistr_(c(170:140),table=archosauria$sptab_Stegosauria) curve(divdistr_(x,archosauria$sptab_Stegosauria), xlim=c(200,100),ylim=c(-5,35)) ts.stages(ylim=c(-6,-1),alpha=0.3,border=add.alpha("grey")) ts.periods(ylim=c(-6,-1),alpha=0.0)
data(archosauria) divdistr_(c(170:140),table=archosauria$sptab_Stegosauria) curve(divdistr_(x,archosauria$sptab_Stegosauria), xlim=c(200,100),ylim=c(-5,35)) ts.stages(ylim=c(-6,-1),alpha=0.3,border=add.alpha("grey")) ts.periods(ylim=c(-6,-1),alpha=0.0)
Count number of taxon records overlapping a specific time interval.
divdistr_int(x, table = NULL, ids = FALSE, max = table$max, min = table$min)
divdistr_int(x, table = NULL, ids = FALSE, max = table$max, min = table$min)
x |
A numeric vector of length 2 specifying the start and end (in ma) of the time interval in question. |
table |
Taxon-range table to use |
ids |
Logical whether to return ids of entries in taxon-range table (defaults to FALSE) or their number |
max |
Vector or column containing the maximum age of each entry in the taxon-range table. Defaults to table$max |
min |
Vector or column containing the minimum age of each entry in the taxon-range table. Defaults to table$min |
A single numeric giving the number of entries in table overlapping the specified interval, or a numeric vector giving their indices.
data(archosauria) divdistr_int(x=c(201,220), table=archosauria$sptab_Coelophysoidea)
data(archosauria) divdistr_int(x=c(201,220), table=archosauria$sptab_Coelophysoidea)
A dataset of diversity by stage, exemplifying the output produced by the divDyn-package.
diversity_table
diversity_table
A data.frame() containing mean ages and diversity figures by stage.
ages for each stage in the phanerozoic
ages converted for plotting on tree_archosauria, using the tsconv()-function
diversity by stage for Sauropodomorpha
diversity by stage for each of the taxa represented in tree_archosauria
...
Replicate the standard color scheme from ggplot2
ggcol(n)
ggcol(n)
n |
Length of color vector to return. |
A character vector containing color hex codes.
ggcol(3)
ggcol(3)
plot data as a jitter-plot
jitterp(x, y, width, col = "black", alpha = 0.5, ...)
jitterp(x, y, width, col = "black", alpha = 0.5, ...)
x |
x values to plot (if single value and y is a vector, plot is vertical) |
y |
y value at which to plot (if single value, plot is horizontal) |
width |
standard deviation for jitter |
col |
color for points |
alpha |
opacity for points |
... |
other parameters to be passed on to points() |
adds the points to the open plotting device as a jitter plot and returns an invisible list()-object containing the positions of all points
c(1,2,3,2,3,2,3,4,4)->tmp hist(tmp) jitterp(x=tmp, y=1, width=0.1)
c(1,2,3,2,3,2,3,4,4)->tmp hist(tmp) jitterp(x=tmp, y=1, width=0.1)
Generate a taxon-range table based on an occurrence dataset.
mk.sptab( xx = NULL, taxa = xx$tna, earliest = xx$eag, latest = xx$lag, tax = NULL )
mk.sptab( xx = NULL, taxa = xx$tna, earliest = xx$eag, latest = xx$lag, tax = NULL )
xx |
A data.frame() of occurrence records, containing at least the following columns: taxonomic name at level at which ranges are to be determined (e.g. species or genus), earliest possible age for each occurrence and latest possible age for each occurrence. If xx==NULL, then each column or vector must be specified individualy using the following parameters |
taxa |
column/vector containing the taxonomic variable. Defaults to xx$tna |
earliest |
column/vector containing the earliest age estimate. Defaults to xx$eag. |
latest |
column/vector containing the latest age estimate. Defaults to xx$lag. |
tax |
Optional. A single character string containing the taxon name, to be added as another column to the range table (useful for categorization, should several range tables be concatenated, e.g. using rbind()). |
A data.frame() containing the taxon names, the maximum and minimum age for each taxon, and (optionally) a column with the name of the higher-level taxon.
data(archosauria) mk.sptab(archosauria$Stegosauria)->sptab_Stegosauria
data(archosauria) mk.sptab(archosauria$Stegosauria)->sptab_Stegosauria
Wrapper around jitterp that plots multiple jitter plots on the same plotting device (analogous to violins())
multijitter( x, data = NULL, group = NULL, horiz = FALSE, order = NULL, xlab = "", ylab = "", col = "black", pch = 16, spaces = "_", width = 0.1, xlim = NULL, ylim = NULL, add = TRUE, ax = FALSE, srt = 45, ... )
multijitter( x, data = NULL, group = NULL, horiz = FALSE, order = NULL, xlab = "", ylab = "", col = "black", pch = 16, spaces = "_", width = 0.1, xlim = NULL, ylim = NULL, add = TRUE, ax = FALSE, srt = 45, ... )
x |
plotting statistic (numeric vector) or formula object from which a plotting statistic and grouping variable can be extracter (i.e. of form x~group) |
data |
data.frame object containing x and y |
group |
grouping variable |
horiz |
logical indicating whether to plot horizontally |
order |
order of factor levels of categorical factor |
xlab |
x axis label |
ylab |
y axis label |
col |
vector of border colors |
pch |
vector of symbols |
spaces |
character string in group to replace with spaces for labels, if not NULL |
width |
standard deviation for jitter |
xlim |
x limits (data limits used if NULL) |
ylim |
y limits (data limits used if NULL) |
add |
logical whether to add to existing plot (default: TRUE) |
ax |
whether to plot axes |
srt |
angle for categorical axis text rotation |
... |
other arguments to pass on to jitterp() and plot() |
data.frame(p=rnorm(50), cat=rep(c("A","B","B","B","B"),10))->d multijitter(p~cat,d, add=FALSE)
data.frame(p=rnorm(50), cat=rep(c("A","B","B","B","B"),10))->d multijitter(p~cat,d, add=FALSE)
Clean up occurrence dataset by removing commonly used character combinations in the identified name that will result in different factor levels for the same taxon.
occ.cleanup(x, remove = NULL, return.df = FALSE)
occ.cleanup(x, remove = NULL, return.df = FALSE)
x |
A occurrence data.frame or character vector containing the variable to clean up (defaults to x$tna) |
remove |
Which values to remove. If NULL, a default set of commonly occurring character combinations is used ("n. gen.", "n. sp.", "cf.","aff.", punctuation, as well as double, leading and ending spaces). If user-defined, remove needs to be formatted as a character vector with the values to be removed as names, i.e. in the format of c("remove_this" = "", "removethistoo"="") |
return.df |
A logical indicating whether to return the entire data.frame (if TRUE) or just the column of taxonomic names. |
A character vector containing the cleaned up taxonomic names or a dataframe with cleaned-up tna column (if return.df==TRUE).
data(archosauria) occ.cleanup(archosauria$Stegosauria)->archosauria$Stegosauria
data(archosauria) occ.cleanup(archosauria$Stegosauria)->archosauria$Stegosauria
Download data from the paleobiology database.
pdb( taxon, interval = "all", what = "occs", full = FALSE, base = "https://paleobiodb.org/data1.2/", file = "list.csv", cc = NULL, envtype = NULL, append_additional = NULL )
pdb( taxon, interval = "all", what = "occs", full = FALSE, base = "https://paleobiodb.org/data1.2/", file = "list.csv", cc = NULL, envtype = NULL, append_additional = NULL )
taxon |
A taxon (base_name) for which to download records. |
interval |
A character string indicating over which temporal interval to download data (defaults to "all"), e.g. "Phanerozoic" or "Jurassic". |
what |
The type of data to download (for details, see https://paleobiodb.org/data1.2/). Defaults to "occs", which downloads occurrence data. Setting this parameter to "colls" will instead download collection data. |
full |
A logical indicating whether or not the full dataset is to be downloaded (defaults to FALSE). At the expense of larger file size, the full dataset contains a large number of additional columns containing data such as stratigraphy, phylogeny and (paleo)geography, which is useful for various purposes but not strictly necessary for graphing paleodiversity. |
base |
Character string containing base url to use. Defaults to https://paleobiodb.org/data1.2/. Entering "dev" serves as a shortcut to use https://dev.paleobiodb.org/data1.2/ instead (can sometimes be helpful if one of the two is unavailable). |
file |
Character string containing which file name to look for. Defaults to list.csv. |
cc |
Selection for continent (e.g. EUR for Europe, see paleobiodb.org documentation) |
envtype |
Selection for environment type (e.g. marine) |
append_additional |
Any additional character string to append to URL for pdb dataset |
A data.frame() containing the downloaded paleobioDB dataset. The column "identified_name" will be copied into the column "tna", and (if what==occs) the columns "max_ma" and "min_ma" will be copied into the columns named "eag" and "lag" respectively, maintaining compatibility with the output of the deprecated package "paleobioDB" for those variable names.
pdb("Stegosauria")->Stegosauria
pdb("Stegosauria")->Stegosauria
A wrapper around pdb(), occ.cleanup() and mk.sptab() to automatically download and clean occurrence data from the paleobiology database and build species-level taxon-range tables for multiple taxa in one step.
pdb.autodiv(taxa, cleanup = TRUE, interval = NULL, ...)
pdb.autodiv(taxa, cleanup = TRUE, interval = NULL, ...)
taxa |
Either a character vector of valid taxonomic names, or an object of class "phylo" whose tip.labels to use instead. |
cleanup |
Logical indicating whether to apply occ.cleanup() to occurrence data after download (defaults to TRUE) |
interval |
Stratigraphic interval for which to download data (defaults to NULL, which downloads data for all intervals) |
... |
additional arguments to be passed on to pdb() |
A list() object containing occurrence data (saved under the taxon names given) and species-level taxon-range tables (saved with the prefix "sptab_" before the taxon names).
pdb.autodiv("Coelophysoidea")->coelo
pdb.autodiv("Coelophysoidea")->coelo
Subtract one occurrence data.frame from another, for disentangling overlapping taxonomies or quantifying stem-lineage diversity.
pdb.diff(x, subtract, id_col = x$occurrence_no)
pdb.diff(x, subtract, id_col = x$occurrence_no)
x |
Occurrence data from which to subtract. |
subtract |
Occurrence data frame or vector of occurrence numbers to subtract from x |
id_col |
Vector or column of x containing id to be used for determining which values are also found in subtract or subtract$occurrence_no |
A data.frame() containing the difference between the two occurrence datasets, i.e. all entries that are in x but not in subtract.
data(archosauria) pdb.union(rbind(archosauria$Ankylosauria, archosauria$Stegosauria))->Eurypoda pdb.diff(Eurypoda, subtract=archosauria$Stegosauria)
data(archosauria) pdb.union(rbind(archosauria$Ankylosauria, archosauria$Stegosauria))->Eurypoda pdb.diff(Eurypoda, subtract=archosauria$Stegosauria)
Form the union of two occurrence data.frames or remove duplicates from occurrence data.frame. Useful if parts of a clade are not included in the downloaded dataset and need to be added separately.
pdb.union(x, id_col = x$occurrence_no)
pdb.union(x, id_col = x$occurrence_no)
x |
Concatenated occurrence data.frames to be merged |
id_col |
Vector or column of x containing id to be used for determining which values contain occurrence numbers to be used for matching entries |
A data.frame() containing the first entry for each unique occurrence to be represented in x.
data(archosauria) pdb.union(rbind(archosauria$Ankylosauria, archosauria$Stegosauria))->Eurypoda
data(archosauria) pdb.union(rbind(archosauria$Ankylosauria, archosauria$Stegosauria))->Eurypoda
Plots a phylogenetic tree with spindle-diagrams, optimized for showing taxonomic diversity.
phylo.spindles( phylo0, occ, stat = divdistr_, prefix = "sptab_", pos = NULL, ages = NULL, xlimits = NULL, ylimits = NULL, res = 1, weights = 1, dscale = 0.002, col = add.alpha("black"), fill = col, lwd = 1, lty = 1, cex.txt = 1, col.txt = add.alpha(col, 1), axis = TRUE, labels = TRUE, txt.y = 0.5, txt.x = NULL, adj.x = NULL, add = FALSE, tbmar = 0.2, smooth = 0, italicize = character() )
phylo.spindles( phylo0, occ, stat = divdistr_, prefix = "sptab_", pos = NULL, ages = NULL, xlimits = NULL, ylimits = NULL, res = 1, weights = 1, dscale = 0.002, col = add.alpha("black"), fill = col, lwd = 1, lty = 1, cex.txt = 1, col.txt = add.alpha(col, 1), axis = TRUE, labels = TRUE, txt.y = 0.5, txt.x = NULL, adj.x = NULL, add = FALSE, tbmar = 0.2, smooth = 0, italicize = character() )
phylo0 |
A time-calibrated phylogenetic tree to plot with spindle diagrams, or a character vector of taxonomic names for which to plot spindle diagrams. |
occ |
Either a list()-object containing taxon-range tables for plotting diversity, or a matrix() or data.frame()-object that contains numerical plotting statistics. If the latter is provided, the default use of divdistr_() is overridden and the function will look for a column named "x" and columns matching the phylogeny tip.labels to plot the spindles. |
stat |
Plotting statistic to be passed on to viol(). Defaults to use divdistr_(). |
prefix |
Prefix for taxon-range tables in occ. Defaults to "sptab_" |
pos |
Position at which to draw spindles. If NULL (default), then spindles are drawn at c(1:n) where n is the number of taxa in phylo0. |
ages |
Optional matrix with lower and upper age limits for each spindle, formatted like the output of tree.ages() (most commonly the same calibration matrix used to time-calibrate the tree) |
xlimits |
Limits for plotting on the x axis. |
ylimits |
Limits for plotting on the y axis. If NULL (default) or not a numeric vector of length 2, the y limits are instead constructed from the tbmar parameter and the number of entries in the phylogeny or taxon list. |
res |
Temporal resolution of diversity estimation (if occ is a matrix or data.frame containing plotting statistics, this is ignored) |
weights |
Weights for diversity estimation. Must have the same length as the range of xlimits divided by res. For details, see divdistr_() |
dscale |
Scale value of the spindles on the y axis. Should be adjusted manually to optimize visibility of results. |
col |
Color to use for the border of the plotted spindles |
fill |
Color to use for the fill of the plotted spindles. Defaults to col. |
lwd |
Line width for the plotted spindles. |
lty |
Line type for the plotted spindles. |
cex.txt |
Adjustment for tip label text size |
col.txt |
tip label text color, defaults to be same as col, but with no transparency |
axis |
Logical indicating whether to plot (temporal) x axis (defaults to TRUE) |
labels |
Logical indicating whether to plot tip labels of phylogeny (defaults to TRUE) |
txt.y |
y axis alignment of tip labels |
txt.x |
x coordinates for plotting tip labels. Can be a single value applicable to all labels, or a vector of the same length as phylo0$tip.label. If NULL (default), the right margin of the plot is used with right-hand alignment for the text. |
adj.x |
Numeric value giving alignment on x axis. If NULL (default) this defaults to 0 (left-aligned) but can also any other adjustment value (e.g. 0.5 for centered, 1 for right-aligned). |
add |
Logical indicating whether to add to an existing plot, in which case only the spindles are plotted on top of an existing phylogeny, or not, in which case the phylogeny is plotted along with the spindles. |
tbmar |
Top and bottom margin around the plot. Numeric of either length 1 or 2 |
smooth |
Smoothing parameter to be passed on to divdistr_() |
italicize |
Character or numeric vector specifying which labels to italicize, if any. |
The phylo.spindles() function allows the plotting of a phylogeny with spindle diagrams at each of its terminal branches. Various data can be represented (e.g. disparity, abundance, various diversity measures, such as those output by the divDyn package, etc.) depending on the settings for occ and stat, but the function is optimized to plot the results of divdistr_() and does so by default. If another function is used as an argument to stat, it has to be able to take the sequence resulting from xlimits and res as its first, and occ as its 'table' argument and return a vector of the same length as range(xlimits)/res to be plotted. If occ is a list() object containing multiple dataframes, occurrence datasets or taxon range tables are automatically converted to work with abdistr_() or divdistr_() respectively (if the plot contains a phylogeny). If occ is a matrix or data.frame, the x values must already be converted (e.g. using tsconv()) to match the phylogeny.
A plotted phylogeny with spindle diagrams plotted at each of its terminal branches.
data(archosauria) data(tree_archosauria) data(ages_archosauria) data(diversity_table) phylo.spindles(tree_archosauria,occ=archosauria,dscale=0.005,ages=ages_archosauria,txt.x=66) phylo.spindles(tree_archosauria,occ=diversity_table,dscale=0.005,ages=ages_archosauria,txt.x=66)
data(archosauria) data(tree_archosauria) data(ages_archosauria) data(diversity_table) phylo.spindles(tree_archosauria,occ=archosauria,dscale=0.005,ages=ages_archosauria,txt.x=66) phylo.spindles(tree_archosauria,occ=diversity_table,dscale=0.005,ages=ages_archosauria,txt.x=66)
Redraw the lines of a phylogenetic tree.
redraw.phylo( saved_plot = NULL, col = "black", lwd = 1, lty = 1, lend = 2, arrow.l = 0, arrow.angle = 45, arrow.code = 2, indices = NULL )
redraw.phylo( saved_plot = NULL, col = "black", lwd = 1, lty = 1, lend = 2, arrow.l = 0, arrow.angle = 45, arrow.code = 2, indices = NULL )
saved_plot |
Optional saved plot (e.g. using get("last_plot.phylo", envir = ape::.PlotPhyloEnv)) to be used instead of currently active plot. |
col |
Color to be used for redrawing tree edges. |
lwd |
Line width to be used for redrawing tree edges. |
lty |
Line type to be used for redrawing tree edges. |
lend |
Style of line ends to be used for redrawing tree edges. |
arrow.l |
Length of arrow ends to be used for plotting. Defaults to 0, i.e. no visible arrow. |
arrow.angle |
Angle of arrow ends to be used for plotting. Defaults to 45 degrees. |
arrow.code |
Arrow code to be used for plotting. For details, see ?arrows |
indices |
Optional indices which edges to redraw. Can be used to highlight specific edges in different color or style. |
Nothing (redraws selected edges of the phylogeny on the active plot device)
data(tree_archosauria) ape::plot.phylo(tree_archosauria) redraw.phylo(col="darkred",lwd=3,indices=c(19:24)) redraw.phylo(col="red",lwd=3,indices=c(18),arrow.l=0.1)
data(tree_archosauria) ape::plot.phylo(tree_archosauria) redraw.phylo(col="darkred",lwd=3,indices=c(19:24)) redraw.phylo(col="red",lwd=3,indices=c(18),arrow.l=0.1)
Calculate a rolling mean for a vector x.
rmean(x, width = 11)
rmean(x, width = 11)
x |
Numeric vector for which to calculate the rolling mean. |
width |
Width of the interval over which to calculate rolling mean values. Should be an uneven number (even numbers are coerced into the next-higher uneven number) |
A numeric vector of the same length as x containing the calculated rolling means, with the first and last few values being NA (depending on the setting for width)
rmean(x=c(1,2,3,4,5,6),width=5)
rmean(x=c(1,2,3,4,5,6),width=5)
Calculate a rolling mean based on distance within a second variable.
rmeana(x0, y0, x1 = NULL, plusminus = 5, weighting = FALSE, weightdiff = 0)
rmeana(x0, y0, x1 = NULL, plusminus = 5, weighting = FALSE, weightdiff = 0)
x0 |
Numeric independent variable at which rolling mean is to be calculated. |
y0 |
Numeric variable of which mean is to be calculated. |
x1 |
Optional. New x values at which rolling mean of y0 is to be calculated. If x1==NULL, calculation will take place at original (x0) values. |
plusminus |
Criterium for the width (in x0) of the interval over which rolling mean values are to be calculated. Value represents the margin as calculated from every value of x1 or x0, i.e. for a plusminus==5, the interval over which the means are drawn will range from values with x-x_i=5 to x-x_i=-5. |
weighting |
Whether or not to apply weighting. If weighting==TRUE, then means are calculated as weighted means with weighting decreasing linearly towards the margins of the interval over which the mean is to be drawn. |
weightdiff |
Minimum weight to be added to all weights if weighting==TRUE. Defaults to 0. |
A numeric vector of the same length as either x1 (if not NULL) or x0, containing the calculated rolling means.
rmeana(x0=c(1,2,3,4,5,6), y0=c(2,3,3,4,5,6))
rmeana(x0=c(1,2,3,4,5,6), y0=c(2,3,3,4,5,6))
Extract subsets of an occurrence data.frame.
stax.sel(taxa, rank = x$class, x = NULL)
stax.sel(taxa, rank = x$class, x = NULL)
taxa |
A vector containing subtaxa (or any other entries matching entries of rank) to be returned |
rank |
Vector or column of x in which to look for entries matching taxa. defaults to x$class, for selecting class-level subtaxa from large datasets (only works if pdb(...,full=TRUE)) |
x |
Optional occurrence data.frame. If set, a data.frame with the selected entries will be returned. |
If is.null(x) (default), a vector giving the indices of values matching taxa in rank. Otherwise, an occurrence data.frame() containing only the selected taxa or values.
data(archosauria) archosauria$Stegosauria->stegos stax.sel(c("Stegosaurus"), rank=stegos$genus,x=stegos)->Stegosaurus
data(archosauria) archosauria$Stegosauria->stegos stax.sel(c("Stegosaurus"), rank=stegos$genus,x=stegos)->Stegosaurus
Combine selected entries in a taxon-range table to remove duplicates
synonymize(x, table = NULL, ids = table$tna, max = table$max, min = table$min)
synonymize(x, table = NULL, ids = table$tna, max = table$max, min = table$min)
x |
Indices or values (taxon names) to combine |
table |
Taxon-range table |
ids |
Vector or column of taxon names (used for matching taxon names in x). Defaults to table$tna |
max |
Vector or column containing maximum ages |
min |
Vector or column containing minimum ages |
This function is meant as an aid to manually editing species tables and remove synonyms or incorrect spellings of taxonomic name that result in an inflated number of distinct taxa being represented.
A data.frame containing taxon names, maximum, minimum and mean ages, with ranges for the selected entries merged and superfluous entries removed (note that the first taxon indicated by x is kept as valid).
data(archosauria) sp<-archosauria$sptab_Stegosauria synonymize(c(32,33),sp)->sp synonymize(grep("stenops",sp$tna),sp)->sp synonymize(c("Hesperosaurus mjosi","Stegosaurus mjosi"),sp)->sp
data(archosauria) sp<-archosauria$sptab_Stegosauria synonymize(c(32,33),sp)->sp synonymize(grep("stenops",sp$tna),sp)->sp synonymize(c("Hesperosaurus mjosi","Stegosaurus mjosi"),sp)->sp
A time-calibrated phylogenetic tree of Archosauria.
tree_archosauria
tree_archosauria
An object of class==phylo with 13 tips and 12 internal nodes.
Combine two calibration matrixes and fill in NA values in one with values from another
tree.age.combine(ages0, ages1)
tree.age.combine(ages0, ages1)
ages0 |
First matrix, NA values in which to replace with values from second matrix |
ages1 |
matrix from which to take replacement values |
tree.age.combine builds the union of two calibration matrices if some of the values in one of them are NAs. If exact matches for some entries cannot be found, a relaxed search matching only the first word (i.e. usually the genus name) in each taxon name is run, in order to fill in as much of the age matrix as possible with non-NA values. It is highly recommended to manually inspect the resulting table for accuracy.
A two-column matrix containing earliest and latest occurrences for each taxon in taxa, with taxon names as row names
data(archosauria) data(tree_archosauria) tree.ages.spp(tree_archosauria,data=archosauria$sptab_Ornithopoda)->ages_A tree.ages.spp(tree_archosauria,data=archosauria$sptab_Allosauroidea)->ages_B tree.age.combine(ages_A,ages_B)->ages
data(archosauria) data(tree_archosauria) tree.ages.spp(tree_archosauria,data=archosauria$sptab_Ornithopoda)->ages_A tree.ages.spp(tree_archosauria,data=archosauria$sptab_Allosauroidea)->ages_B tree.age.combine(ages_A,ages_B)->ages
Automatically build matrix for time-calibration of phylogenetic trees using occurrence data.
tree.ages(phylo0 = NULL, data = NULL, taxa = NULL)
tree.ages(phylo0 = NULL, data = NULL, taxa = NULL)
phylo0 |
Either an object of class phylo, or a character vector containing taxon names for building the matrix |
data |
Optional list()-object containing either taxon-range tables or occurrence datasets for all taxa. If NULL, data will be automatically downloaded via the pdb()-function |
taxa |
Deprecated argument; vector containing taxa to include in calibration matrix (can now be provided directly as phylo0) |
tree.ages works best for getting occurrence dates for higher-level taxa (genus-level and up) that can be used as a base_name in a call to the paleobiology database and will return NAs for species names (or any other taxon that cannot be found in the paleobiology database or the provided list object). For a function optimized to recover taxon ranges for genera and species, see tree.ages.spp(). It is highly recommended to manually inspect the resulting table for accuracy.
A two-column matrix containing earliest and latest occurrences for each taxon in taxa, with taxon names as row names
data(archosauria) data(tree_archosauria) tree.ages(tree_archosauria,data=archosauria)->ages
data(archosauria) data(tree_archosauria) tree.ages(tree_archosauria,data=archosauria)->ages
Automatically build matrix for time-calibration of phylogenetic trees using occurrence data.
tree.ages.spp(phylo0, data)
tree.ages.spp(phylo0, data)
phylo0 |
Either an object of class phylo, or a character vector containing taxon names for building the matrix |
data |
A higher-level taxon name to get data for in the paleobiology database, or a data.frame containing a species table containing entries for the taxa in question. |
tree.ages looks for the taxon names in the tna column of a taxon-range table (as produced by mk.sptab()), so it will only recover ages for taxa that can be found there. For a function optimized for higher-level taxa that might not be represented in such a table, see tree.ages(). It is highly recommended to manually inspect the resulting table for accuracy.
A two-column matrix containing earliest and latest occurrences for each taxon in taxa, with taxon names as row names
data(archosauria) data(tree_archosauria) tree.ages.spp(tree_archosauria,data=archosauria$sptab_Ornithopoda)->ages
data(archosauria) data(tree_archosauria) tree.ages.spp(tree_archosauria,data=archosauria$sptab_Ornithopoda)->ages
Add a horizontal, period-level phanerozoic timescale to any plot, especially calibrated phylogenies plotted with ape.
ts.periods( phylo = NULL, alpha = 1, names = TRUE, exclude = c("Quarternary"), col.txt = NULL, border = NA, ylim = NULL, adj.txt = c(0.5, 0.5), txt.y = mean, bw = FALSE, update = NULL )
ts.periods( phylo = NULL, alpha = 1, names = TRUE, exclude = c("Quarternary"), col.txt = NULL, border = NA, ylim = NULL, adj.txt = c(0.5, 0.5), txt.y = mean, bw = FALSE, update = NULL )
phylo |
Optional (calibrated) phylogeny to which to add timescale. If phylogeny is provided, the $root.time variable is used to convert ages so that the time scale will fit the phylogeny. |
alpha |
Opacity value to use for the fill of the time scale |
names |
Logical indicating whether to plot period names (defaults to TRUE) |
exclude |
Character vector listing periods for which to not plot the names, if names==TRUE |
col.txt |
Color(s) to use for labels. |
border |
Color to use for the border of the timescale |
ylim |
Setting for height of the timescale. Can either be one single value giving the height of the timescale, in which case the function attempts to use the lower limit of the current plot as the lower margin, or a vector of length 2 containing the lower and upper limits of the timescale. |
adj.txt |
Numeric vector of length==2 giving horizontal and vertical label alignment (defaults to centered, i.e. 0.5 for both values) |
txt.y |
Function to use to determine the vertical text position (defaults to mean, i.e. centered) |
bw |
Logical whether to plot in black and white (defaults to FALSE). If TRUE, time scale is drawn with a white background |
update |
Character string giving the filename of a .csv table for providing an updated timescale. If provided, the values for plotting the time scale are taken from the csv file instead of the internally provided values. Table must have columns named periods, bottom, top and col, giving the period names, start time in ma, end time in ma and a valid color value, respectively. |
Plots a timescale on the currently active plot.
data(tree_archosauria) ape::plot.phylo(tree_archosauria) ts.periods(tree_archosauria, alpha=0.5)
data(tree_archosauria) ape::plot.phylo(tree_archosauria) ts.periods(tree_archosauria, alpha=0.5)
Add a horizontal, stage-level phanerozoic timescale to any plot, especially calibrated phylogenies plotted with ape.
ts.stages( phylo = NULL, alpha = 1, names = FALSE, col.txt = NULL, border = NA, ylim = NULL, adj.txt = c(0.5, 0.5), txt.y = mean, bw = FALSE, update = NULL )
ts.stages( phylo = NULL, alpha = 1, names = FALSE, col.txt = NULL, border = NA, ylim = NULL, adj.txt = c(0.5, 0.5), txt.y = mean, bw = FALSE, update = NULL )
phylo |
Optional (calibrated) phylogeny to which to add timescale. If phylogeny is provided, the $root.time variable is used to convert ages so that the time scale will fit the phylogeny. |
alpha |
Opacity value to use for the fill of the time scale |
names |
Logical indicating whether to plot stage names (defaults to FALSE) |
col.txt |
Color(s) to use for labels. |
border |
Color to use for the border of the timescale |
ylim |
Setting for height of the timescale. Can either be one single value giving the height of the timescale, in which case the function attempts to use the lower limit of the current plot as the lower margin, or a vector of length 2 containing the lower and upper limits of the timescale. |
adj.txt |
Numeric vector of length==2 giving horizontal and vertical label alignment (defaults to centered, i.e. 0.5 for both values) |
txt.y |
Function to use to determine the vertical text position (defaults to mean, i.e. centered) |
bw |
Logical whether to plot in black and white (defaults to FALSE). If TRUE, time scale is drawn with a white background |
update |
Character string giving the filename of a .csv table for providing an updated timescale. If provided, the values for plotting the time scale are taken from the csv file instead of the internally provided values. Table must have columns named stage, bottom, top and col, giving the stage names, start time in ma, end time in ma and a valid color value, respectively. |
Plots a timescale on the currently active plot.
data(tree_archosauria) ape::plot.phylo(tree_archosauria) ts.stages(tree_archosauria, alpha=0.7) ts.periods(tree_archosauria, alpha=0)
data(tree_archosauria) ape::plot.phylo(tree_archosauria) ts.stages(tree_archosauria, alpha=0.7) ts.periods(tree_archosauria, alpha=0)
Convert geological ages for accurate plotting alongside a calibrated phylogeny
tsconv(x, phylo0 = NULL, root.time = phylo0$root.time)
tsconv(x, phylo0 = NULL, root.time = phylo0$root.time)
x |
A vector of geological ages to be converted. |
phylo0 |
Phylogeny from which to take root.age |
root.time |
Numeric root age, if not taken from a phylogeny |
A numeric() containing the converted geological ages
tsconv(c(252,201,66), root.time=300)
tsconv(c(252,201,66), root.time=300)
Generate a violin plot
viol( x, pos = 0, x2 = NULL, stat = density, dscale = 1, cutoff = range(x), horiz = TRUE, add = TRUE, lim = cutoff, xlab = "", ylab = "", fill = "grey", col = "black", lwd = 1, lty = 1, na.rm = FALSE, ... )
viol( x, pos = 0, x2 = NULL, stat = density, dscale = 1, cutoff = range(x), horiz = TRUE, add = TRUE, lim = cutoff, xlab = "", ylab = "", fill = "grey", col = "black", lwd = 1, lty = 1, na.rm = FALSE, ... )
x |
Variable for which to plot violin. |
pos |
Position at which to place violin in the axis perpendicular to x. Defaults to 0 |
x2 |
Optional variable to override the use of x as input variable for the plotting statistic. If x2 is set, the function (default: density()) used to calculate the plotting statistic is run on x2 instead of x, but the results are plotted at the corresponding values of x. |
stat |
The plotting statistic. Details to the density() function, as in a standard violin plot, but can be overridden with another function that can take x or x2 as its first argument. Stat can also be a numeric vector of the same length as x, in which case the values in this vectors are used instead of the function output and plotted against x as an independent variable. |
dscale |
The scale to apply to the values for density (or another plotting statistic). Defaults to 1, but adjustment may be needed depending on the scale of the plot the violin is to be added to. |
cutoff |
Setting for cropping the violin. Can be either a single value, in which case the input is interpreted as number of standard deviations from the mean, or a numeric vector of length 2, giving the lower and upper cutoff value directly. |
horiz |
Logical indicating whether to plot horizontally (defaults to TRUE) or vertically |
add |
Logical indicating whether to add to an existing plot (defaults to TRUE) or generate a new plot. |
lim |
Limits (in the dimensions of x) used for plotting, if add==FALSE. Defaults to cutoff, but can be manually set as a numeric vector of length 2, giving the lower and upper limits of the plot. |
xlab |
x axis label |
ylab |
y axis label |
fill |
Fill color for the plotted violin |
col |
Line color for the plotted violin |
lwd |
Line width for the plotted violin |
lty |
Line width for the plotted violin |
na.rm |
logical indicating whether to remove NA values from input data. |
... |
Other arguments to be passed on to function in parameter stat |
Viol provides a versatile function for generating violin plots and adding them to r base graphics. The default plotting statistic is density(), resulting in the standard violin plot. However, density can be overridden by entering any function that can take x or x2 as its first argument, or any numeric vector containing the data to be plotted, as long as this vector is the same length as x.
A violin plot and a data.frame containing the original and modified plotting statistic and independent variable against which it is plotted.
viol(x=c(1,2,2,2,3,4,4,3,2,2,3,3,4,5,3,3,2,2,1,6,7,6,9),pos=1, add=FALSE) viol(c(1:10), width=9, stat=rmean, pos=0, add=FALSE) viol(c(1:10), stat=c(11:20), pos=0, add=FALSE)
viol(x=c(1,2,2,2,3,4,4,3,2,2,3,3,4,5,3,3,2,2,1,6,7,6,9),pos=1, add=FALSE) viol(c(1:10), width=9, stat=rmean, pos=0, add=FALSE) viol(c(1:10), stat=c(11:20), pos=0, add=FALSE)
Wrapper around viol() to conveniently plot multiple violins on a single plot, analogous to the behavior of boxplot()
violins( x, data = NULL, group = NULL, horiz = FALSE, order = NULL, xlab = "", ylab = "", col = "black", fill = "grey", lwd = 1, lty = 1, dscale = 1, xlim = NULL, ylim = NULL, spaces = "_", add = FALSE, ax = TRUE, srt = 45, na.rm = TRUE, ... )
violins( x, data = NULL, group = NULL, horiz = FALSE, order = NULL, xlab = "", ylab = "", col = "black", fill = "grey", lwd = 1, lty = 1, dscale = 1, xlim = NULL, ylim = NULL, spaces = "_", add = FALSE, ax = TRUE, srt = 45, na.rm = TRUE, ... )
x |
plotting statistic (numeric vector) or formula object from which a plotting statistic and grouping variable can be extracted (i.e. of form x~group) |
data |
data.frame object containing x and y |
group |
grouping variable |
horiz |
logical indicating whether to plot horizontally |
order |
order of factor levels of categorical factor |
xlab |
x axis label |
ylab |
y axis label |
col |
vector of border colors |
fill |
vector of fill colors |
lwd |
vector of line widths |
lty |
vector of line types |
dscale |
density scaling factors (numeric) to apply to individual violins |
xlim |
x limits (data limits used if NULL) |
ylim |
y limits (data limits used if NULL) |
spaces |
character string in group to replace with spaces for labels, if not NULL |
add |
logical whether to add to existing plot (default: FALSE) |
ax |
whether to plot axes |
srt |
angle for categorical axis text rotation |
na.rm |
logical indicating whether to tell viol() to remove NA values (defaults to TRUE) |
... |
other arguments to pass on to paleoDiv::viol() and plot() |
data.frame(p=rnorm(50), cat=rep(c("A","B","B","B","B"),10))->d violins(p~cat,d)
data.frame(p=rnorm(50), cat=rep(c("A","B","B","B","B"),10))->d violins(p~cat,d)