tcpips package
Submodules
tcpips.ani module
- tcpips.ani.ani_example(exp='ssp585', model='CESM', member='r4i1f1p1')
tcpips.bias module
Compare the CMIP6 historical period against ERA5.
We might want to load the raw variables, or the derived variables like potential intensity and potential size.
For each ensemble member then might want to calculate the mean bias for each grid point for each variable, but also if this bias has a trend, and perhaps how significant the bias is.
For the ensemble as a whole we might want to compare how the distribution of mean biases and their trends compare, and whether there are significant difference between different families of models.
TODO: Find linear trend in bias, CMIP6, and ERA5. Work out significance of each.
TODO: Compare the distribution of biases, trends, and significance between different models and ensemble members.
- tcpips.bias.calc_bias(start_year=1980, end_year=2014, model='CESM2', member='r4i1p1f1', exp='historical')
Calc mean biases over specified period.
- Parameters:
- Return type:
- tcpips.bias.load_cmip6_data(exp='historical', model='CESM2', member='r4i1p1f1')
Load CMIP6 data for a specific experiment, model, and member.
This function is a placeholder for loading CMIP6 data. The actual implementation would depend on the data source and format.
- tcpips.bias.read_ensemble_for_point_var(model='CESM2', member='r4i1p1f1', exp='historical', var='vmax', lat=29.95, lon=-90.07)
Read ensemble data for a specific point and variable.
- Parameters:
model (str) – Model name (default: “CESM2”).
member (str) – Ensemble member (default: “r4i1p1f1”).
exp (str) – Experiment name (default: “historical”).
var (str) – Variable to read (default: “vmax”).
lat (float) – Latitude of the point (default: 29.95).
lon (float) – Longitude of the point (default: -90.07).
- Returns:
- Data for the specified point and variable.
Will have dimension “year” instead of “time.
- Return type:
xarray.DataArray
tcpips.constants module
This file is used to save all possible project wide constants.
Includes source folder, the project path, etc.
Example
Import statement at top of script:
from tcpips.constants import PROJECT_PATH, FIGURE_PATH
tcpips.convert module
Convert CMIP6 variables to be PI input variables.
- tcpips.convert.convert(ds)
Convert CMIP6 variables to be PI input variables.
Assume that you can do a linear conversion between the variables:
y = m*x + c
m stored in CONVERSION_MULTIPLES, c stored in CONVERSION_ADDITIONS.
units are stored in CONVERSION_UNITS.
plev is a special case, it is converted to hPa, assuming it was Pa to start with.
- Parameters:
ds (xr.Dataset) – CMIP6 dataset.
- Returns:
PI dataset.
- Return type:
xr.Dataset
tcpips.dask_utils module
Dask utilities for parallel processing.
- tcpips.dask_utils.dask_cluster_wrapper(func, *args, **kwargs)
A wrapper to run a function on a Dask cluster. This is useful for running functions that take a long time to compute and can be parallelized across multiple workers.
I think this only works as long as you use it in the ‘if __name__ == “__main__”:’ part of the script. Not really sure.
- Return type:
tcpips.driver module
A driver to run the whole pipeline for CMIP6 data.
Steps: 1. Download data, 2. Regrid data, 3. Calculate potential intensity, 4. Calculate potential sizes 5. Calculate biases (possibly after regridding on to either ERA5 or the regridded CMIP6 grid).
In stages 1-2 The ouputs are stored in netcdf files, in stages 3-5 the outputs are stored in zarr format.
Maybe it could be good to control this function with a config file, so we can specify which steps to run for which experiments and models.
tcpips.era5 module
This script downloads monthly ERA5 reanalysis data from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS).
The script uses the cdsapi library to interact with the CDS API and download the data in NetCDF format. This requires an API key, which in my case I store at ~/.cdsapirc. You can get your own API key by signing up at the CDS website and following the instructions in the documentation.
The script is designed to download the data to calculate potential intensity (PI) and potential size (PS) for tropical cyclones. Geopotential height is also included so that we could calculate the height of the tropopause (not used).
- tcpips.era5.assign_coordinate_attributes(file_path)
Assigns CF-compliant ‘coordinates’ attributes to data variables.
This function opens a NetCDF file in append mode and inspects each variable. For any variable that has both ‘latitude’ and ‘longitude’ as dimensions (and is not a coordinate variable itself), it sets the ‘coordinates’ attribute to ‘longitude latitude’. This ensures compliance with CF conventions and improves interoperability with analysis tools.
- Parameters:
file_path (str) – The path to the NetCDF file to be modified.
Doctests: >>> # Setup a dummy NetCDF file for the test >>> file_name = ‘test_coord.nc’ >>> # ✅ Use the simpler NETCDF3 format to avoid HDF5 filesystem issues in CI >>> with nc.Dataset(file_name, ‘w’, format=’NETCDF3_CLASSIC’) as ds: … _ = ds.createDimension(‘latitude’, 10) … _ = ds.createDimension(‘longitude’, 20) … _ = ds.createDimension(‘time’, 2) … _ = ds.createVariable(‘latitude’, ‘f4’, (‘latitude’,)) … _ = ds.createVariable(‘longitude’, ‘f4’, (‘longitude’,)) … _ = ds.createVariable(‘time’, ‘i4’, (‘time’,)) … _ = ds.createVariable(‘vmax’, ‘f8’, (‘time’, ‘latitude’, ‘longitude’)) … _ = ds.createVariable(‘pmin’, ‘f8’, (‘time’, ‘latitude’, ‘longitude’)) >>> # Run the function >>> assign_coordinate_attributes(file_name) Processing variable: vmax… Added ‘coordinates’ attribute. Processing variable: pmin… Added ‘coordinates’ attribute. >>> # Verify the attributes were set >>> with nc.Dataset(file_name, ‘r’) as ds: … print(ds.variables[‘vmax’].coordinates) … print(ds.variables[‘pmin’].coordinates) longitude latitude longitude latitude >>> # Clean up the test file >>> os.remove(file_name)
- tcpips.era5.calculate_potential_sizes(start_year=1980, end_year=2024, dry_run=False)
Calculate the potential sizes of tropical cyclones from ERA5 data.
This is going to be a very expensive function to run.
- Parameters:
- Returns:
This function saves the results to a zarr file.
- Return type:
None
- tcpips.era5.call_cdo(input_path, output_path)
Call CDO to regrid the input netCDF file to a half-degree grid using multiple CPU cores.
Includes ncdump calls for debugging.
- tcpips.era5.download_era5_data(start_year=1980, end_year=2024)
Downloads ERA5 data for the specified years and months. This function is a wrapper around the download_single_levels and download_pressure_levels functions.
- Return type:
- tcpips.era5.download_pressure_levels(years, months, output_file, redo=False, stitch_together=True)
- Downloads monthly-averaged pressure-level fields:
Temperature (atmospheric profile) (PI and therefore PS)
Specific humidity (atmospheric profile) (PI and therefore PS)
Geopotential (atmospheric profile) (Neither, only for tropopause height)
The pressure levels below are specified in hPa. Note: Geopotential is provided in m²/s². Divide by ~9.81 to convert to geopotential height (meters).
The function handles the case where the number of years exceeds 10 by splitting the years into chunks of 10 years and then calling the function recursively, and finally stitching the temporary files together at the end.
Current download speed from Archer2 login node is around 25 minutes per decade (6.4 GB).
- Parameters:
years (List[str]) – List of years to download data for.
months (List[str]) – List of months to download data for.
output_file (str) – Name of the output file to save the data.
redo (bool) – If True, download the data again even if the file already exists. Default is False.
stitch_together (bool) – If True, stitch the temporary files together at the end. Default is True.
- Return type:
- tcpips.era5.download_single_levels(years, months, output_file, redo=False)
- Downloads monthly-averaged single-level fields in decadal chunks.
Sea surface temperature (for PI and PS)
Mean sea level pressure (for PI and PS)
Relative humidity (PS only)
- tcpips.era5.era5_pi(years)
Calculate potential intensity (PI) using the downloaded ERA5 data for a range of years. Go decade by decade.
- Return type:
- tcpips.era5.era5_pi_decade(single_level_path, pressure_level_path)
Optimized calculation of Potential Intensity for a decade.
- tcpips.era5.era5_pi_trends(start_year=1980, end_year=2024, months=1)
Let’s find the linear trends in the potential intensity
- Parameters:
start_year (int, optional) – The start year for the analysis. Defaults to DEFAULT_START
end_year (int, optional) – The end year for the analysis. Defaults to DEFAULT_END_YEAR.
months (int, optional) – The number of months to average over. Defaults to 1. - 1: Selects August for NH and February for SH. - 3: Averages over Aug-Sep-Oct for NH and Jan-Feb-Mar for SH.
- Return type:
- tcpips.era5.era5_ps_trends(start_year=1980, end_year=2024)
Let’s find the linear trends in the potential intensity
- tcpips.era5.find_tropical_m(start_year=1980, end_year=2024, months=1)
I want to find what the trends are for the tropical cyclone potential intensity and size over the pseudo observational period.
I defined m as the ratio of the change in the outflow temperature to the change in surface temperature. T_s = T_s0 + delta_t T_o = T_o0 + m * delta_t where T_s0 is the initial sea surface temperature, T_o0 is the initial outflow temperature, and delta_t is the change in temperature.
- Parameters:
- Returns:
This function saves the results to a file and does not return anything.
- Return type:
None
- tcpips.era5.fix_era5_pi_decade(output_path)
Currently the ERA5 PI data is saved in a format that cannot be read by cdo.
- Return type:
- tcpips.era5.get_all_data(start_year=1980, end_year=2024, chunk_in='space')
Get both the derived data from PI calculation, and the processed era5 data.
- tcpips.era5.get_all_regridded_data(start_year=1980, end_year=2024, chunk_in='space')
Get all regridded ERA5 data for the specified years.
- tcpips.era5.get_era5_combined(start_year=1980, end_year=2024, chunk_in='space')
Get the raw ERA5 data for potential intensity and size as a lazily loaded xarray dataset.
- tcpips.era5.get_era5_coordinates(start_year=1980, end_year=2024)
Get the coordinates of the ERA5 data.
Should have coordinates longitude, latitude, valid_time.
- tcpips.era5.get_era5_pi(start_year=1980, end_year=2024, chunk_in='space')
Load the potential intensity (PI) data calculated from ERA5 data.
- tcpips.era5.plot_snapshot_map(year=2020)
Plot a snapshot map of sst, vmax_3, rmax_3, r0_3, rmax_1, from the processed ERA5 potential sizes data.
- Return type:
- tcpips.era5.plot_trend_lineplots(lon, lat, label='new_orleans', start_year=1980, end_year=2024, months=1)
Plot the linear trend and variables for a specific point in the ERA5 dataset.
- Parameters:
lon (float) – Longitude of the point to plot.
lat (float) – Latitude of the point to plot.
label (str, optional) – Label for the plot. Defaults to “new_orleans”.
start_year (int, optional) – Start year for the data. Defaults to 1980.
end_year (int, optional) – End year for the data. Defaults to 2020
months (int, optional) – Number of months to average over. Defaults to 1.
- Return type:
- tcpips.era5.plot_trend_maps(start_year=1980, end_year=2024, months=1)
Plot trend in vmax, t0 and otl in another 3 panel subplot with no cbar label but labels in subplot title instead.
- tcpips.era5.plot_trends_all_lineplots()
- tcpips.era5.plot_var_on_map(ax, da, label, cmap, shrink=0.8, hatch_mask=None, second_hatch_mask=None, **kwargs)
Plot a variable on a geographic map.
- tcpips.era5.preprocess_pressure_level_data(ds)
Optimized preprocessing for pressure-level ERA5 data.
Converts temperature to Celsius and specific humidity to g/kg.
- Return type:
Dataset
- tcpips.era5.preprocess_single_level_data(ds)
Optimized preprocessing for single-level ERA5 data.
- Return type:
Dataset
- tcpips.era5.rechunk_file(file_name, new_chunks={'latitude': 321, 'longitude': 720, 'year': -1})
Rechunk a zarr file to have larger chunks for latitude and longitude.
- tcpips.era5.regrid_all(years)
Regrid all the ERA5 data for the given years. :type years:
List[str] :param years: List of years to regrid. :type years: List[str]- Return type:
- tcpips.era5.select_seasonal_hemispheric_data(ds, months_to_average=1, lon='longitude', lat='latitude')
Selects seasonal hemispheric data, with an option to average over peak months.
- Parameters:
ds (xr.Dataset) – An xarray Dataset with ‘time’, ‘latitude’, and ‘longitude’ coordinates. ‘time’ must be a datetime-like coordinate.
months_to_average (int) – The number of months to average. - 1: Selects August for NH and February for SH. - 3: Averages over Aug-Sep-Oct for NH and Jan-Feb-Mar for SH.
- Returns:
A new Dataset with an integer ‘year’ dimension, containing the combined seasonal and hemispheric data.
- Return type:
xr.Dataset
>>> # --- Test Setup --- >>> # Create a 2-year dataset where the data value is the month number. >>> import pandas as pd >>> time = pd.date_range('2020-01-01', '2021-12-31', freq='MS') >>> lats = [-20, 20] # One SH, one NH point >>> lons = [100] >>> month_data = time.month.values.reshape(-1, 1, 1) * np.ones((len(time), len(lats), len(lons))) >>> ds_test = xr.Dataset( ... {'some_variable': (['time', 'latitude', 'longitude'], month_data)}, ... coords={'time': time, 'latitude': lats, 'longitude': lons} ... )
>>> # --- Test Case 1: 3-Month Average --- >>> result_3m = select_seasonal_hemispheric_data(ds_test, months_to_average=3) >>> # For NH (lat=20), average of Jul(7), Aug(8), Sep(9) should be 8.0 >>> assert int(result_3m.sel(year=2021, latitude=20).some_variable) == 8 >>> # For SH (lat=-20), average of Jan(1), Feb(2), Mar(3) should be 2.0 >>> assert int(result_3m.sel(year=2020, latitude=-20).some_variable) == 2 >>> # Check that the dimensions are correct and 'time' is replaced by 'year'. >>> assert sorted(result_3m.dims) == ['latitude', 'longitude', 'year'] >>> assert result_3m.year.values.tolist() == [2020, 2021]
>>> # --- Test Case 2: 1-Month Selection (Original behavior) --- >>> result_1m = select_seasonal_hemispheric_data(ds_test, months_to_average=1) >>> # For NH (lat=20), the value for August (month 8) should be 8. >>> assert int(result_1m.sel(year=2021, latitude=20).some_variable) == 8 >>> # For SH (lat=-20), the value for February (month 2) should be 2. >>> assert int(result_1m.sel(year=2020, latitude=-20).some_variable) == 2
- tcpips.era5.trend_with_neweywest_full(y)
Calculates the linear trend (slope, intercept) and its p-value using OLS with Newey-West standard errors.
- Parameters:
x (#) – A 1D array representing the time variable (e.g., years).
y (np.ndarray) – A 1D array representing the time series.
- Returns:
A tuple with (slope, intercept, p-value).
- Return type:
Doctests: >>> import numpy as np >>> # Test with a simple increasing trend >>> y = np.array([1, 2, 3, 4, 5]) >>> slope, intercept, p_value = trend_with_neweywest_full(y) >>> round(slope, 2) 1.0 >>> round(intercept, 2) 1.0 >>> round(p_value, 4) < 0.05 True
tcpips.era5_ps_calc module
Calculating the potential size is very computationally intensive, so we use dask to parallelize the work, ideally across many nodes on the Archer2 cluster.
- tcpips.era5_ps_calc.process_large_dataset_ex(client, size)
Process a large dataset using Dask. This function creates a large random dataset and performs a mean calculation on it.
tcpips.files module
File handling utilities for tc potential size and intensity calculations.
Keep the file pipeline clean and organized with these functions.
- tcpips.files.file_crawler(folder_to_search='/home/docs/checkouts/readthedocs.org/user_builds/worstsurge/checkouts/latest/data/cmip6/raw')
Function to crawl through a directory and return a dictionary of experiments, models, members.
- tcpips.files.find_atmos_ocean_pairs(path='/home/docs/checkouts/readthedocs.org/user_builds/worstsurge/checkouts/latest/data/cmip6/regridded', new_path='/home/docs/checkouts/readthedocs.org/user_builds/worstsurge/checkouts/latest/data/cmip6/pi')
Find the atmospheric and oceanic data pairs that can be combined to calculate potential intensity.
- tcpips.files.find_missing_raw_cmip6_nc(file_d)
Find missing files in the file dictionary.
- tcpips.files.get_task_dict(original_root='/home/docs/checkouts/readthedocs.org/user_builds/worstsurge/checkouts/latest/data/cmip6/raw', new_root='/home/docs/checkouts/readthedocs.org/user_builds/worstsurge/checkouts/latest/data/cmip6/regridded')
Find all tasks to be done. Return a dictionary of tasks.
- Parameters:
- Returns:
dictionary of tasks.
- Return type:
TODO: adapt so it can work for tasks that take in single files, rathe than ocean and atmos
- tcpips.files.hist_dict_plot(hist_d)
Plot histogram of file sizes.
- tcpips.files.histogram_dict_from_file_dict(file_d)
Function to create a dictionary of file sizes for plotting histograms.
- tcpips.files.investigate_cmip6_pairs(path='/home/docs/checkouts/readthedocs.org/user_builds/worstsurge/checkouts/latest/data/cmip6/regridded')
Investigate the CMIP6 pairs to see if they are the correct size.
- tcpips.files.locker(path)
Decorator to lock a function to prevent multiple instances from running on the same file. I plan to use this for regridding a reprocessing steps, but this could be used for downloads.
- Parameters:
path (str) – path to stage directory where locks are stored.
- Returns:
decorator function
- Return type:
callable
Example
>>> TEST_PATH = os.path.join(CMIP6_PATH, "test") >>> import shutil >>> shutil.rmtree(TEST_PATH, ignore_errors=True) >>> os.makedirs(TEST_PATH, exist_ok=True) >>> @locker(path=TEST_PATH) ... def example(*args, **kw) -> bool: ... ret = os.path.exists(os.path.join(TEST_PATH, f"{kw['exp']}.{kw['model']}.{kw['member']}.lock")) ... return ret >>> example(exp="historical", model="model", member="member") True >>> def write_lock_file(): ... with open(os.path.join(TEST_PATH, "historical.model.member.lock"), "w") as f: ... f.write(time_stamp()) >>> write_lock_file() >>> example(exp="historical", model="model", member="member") Already running example on historical.model.member >>> shutil.rmtree(TEST_PATH)
tcpips.ibtracs module
Process IBTrACS data to calculate along-track potential intensity and size from ERA5.
We only want tracks from 1980-2024 (inclusive). We currently calculate the potential intensity and size using monthly ERA5 data.
TODO: test the effect of excluding e.g. colder water or higher latitudes.
i.e. hypotheses: supersize storms are caused by storms that have extratropical components, or through inertial effects as the potential size rapidly decreases over a patch of cold water.
TODO: standardize all the terminology to Cat1 Potential Inner Size (r_1), Corresponding Potential Inner Size (r_2), PI Potential Innner size (r_3).
- tcpips.ibtracs.add_pi_ps_back_onto_tracks(variables_to_map=('rmax', 'vmax', 'r0', 'pm', 'otl', 't0'), input_name='IBTrACS.since1980.v04r01.unique.nc', era5_name='era5_unique_points_ps.nc', output_name='IBTrACS.since1980.v04r01.pi_ps.nc')
We first need to use the unique counter to map the unique points back onto the IBTrACS data.
- Parameters:
variables_to_map (Tuple[str], optional) – The variables to map back onto the IBTrACS data. Defaults to (“rmax”, “vmax”, “r0”, “pm”, “otl”, “t0”).
input_name (str, optional) – The name of the input IBTrACS dataset with unique counter. Defaults to “IBTrACS.since1980.v04r01.unique.nc”.
output_name (str, optional) – The name of the output IBTrACS dataset with potential intensity and size data added. Defaults to “IBTrACS.since1980.v04r01.pi_ps.nc”.
- tcpips.ibtracs.add_saffir_simpson_boundaries(ax, color='gray', linestyle='--', linewidth=0.8, alpha=0.7, zorder=1)
Adds dashed horizontal lines to a Matplotlib Axes object at the Saffir-Simpson hurricane category boundaries (wind speeds in m/s).
- Parameters:
ax (matplotlib.axes.Axes) – The Axes object to draw on.
color (str, optional) – Color of the lines. Defaults to ‘gray’.
linestyle (str, optional) – Style of the lines. Defaults to ‘–‘.
linewidth (float, optional) – Width of the lines. Defaults to 0.8.
alpha (float, optional) – Transparency of the lines. Defaults to 0.7.
zorder (int, optional) – The zorder for drawing the lines. Lines with lower zorder are drawn first. Defaults to 1 (typically behind most plotted data).
- tcpips.ibtracs.ani_vary_v_cps(v_reduc=0.8, name=b'KATRINA', basin=b'NA', subbasin=b'GM', timestep_end=80, steps=20, v33=33)
Animate the potential size trade off for varying velocities.
- Parameters:
v_reduc (float, optional) – Reduction factor for the velocity. Defaults to 0.8.
name (str, optional) – Name of the tropical cyclone to plot. Defaults to b”KATRINA”.
basin (str, optional) – Basin of the tropical cyclone to plot. Defaults to b”NA”.
subbasin (str, optional) – Subbasin of the tropical cyclone to plot. Defaults to b”GM”.
timestep_end (int, optional) – Timestep to select for the potential intensity and size calculation. Defaults to 20.
steps (int, optional) – Number of steps to vary the velocity. Defaults to 20.
v33 (float, optional) – Threshold for category 1 hurricane in m/s. Defaults to 33.
- Return type:
- tcpips.ibtracs.before_2025(ds)
Filter the dataset to only include data before 2025.
- Return type:
Dataset
- tcpips.ibtracs.calculate_cps(v_reduc=0.8, test=False)
Calculate the corresponding potential size, assuming the windspeed from usa_wind/V_reduc instead of the potential intensity. This will mean that we have to do many more potential size calculations.
- tcpips.ibtracs.calculate_cps_ds(tc_inputs_ds)
Calculate the potential size dataset from the tropical cyclone inputs.
- Parameters:
tc_inputs_ds (xr.Dataset) – Dataset containing the tropical cyclone inputs.
- Returns:
Dataset containing the potential size data.
- Return type:
xr.Dataset
- tcpips.ibtracs.calculate_grid_avg_over_ibtracs_points(ibtracs_ds, vars=('normalized_rmax', 'normalized_vmax'))
Average the ibtracs data onto the era5 grid.
Similar to the calculate_grid_avg_over_unique_points function, but not assuming the latitude and longitudes given are era5 grid points.
- Parameters:
ibtracs_ds (xr.Dataset) – The IBTrACS dataset to average. Each variable should have dimensions (“storm”, “date_time”). The latitude and longitude coordinates similarly are have dimensions (“storm”, “date_time”). The tracks are of variable lengths, and for those points the lons and lats are nans.
vars (tuple, optional) – The variables to average. Defaults to (“normalized_rmax”, “normalized_vmax”). The variables should be in the ibtracs dataset.
- Returns:
The averaged dataset with dimensions (“latitude”, “longitude”). Will also return the count of points in each grid cell.
- Return type:
xr.Dataset
- tcpips.ibtracs.calculate_grid_avg_over_unique_points(ds, variables=('sst', 'msl', 't2m'))
Create for each gridpoints the average over all unique points.
- Parameters:
ds (xr.Dataset) – The unique point dataset to average over.
variables (tuple) – The variables to average over.
- Returns:
The averaged dataset.
- Return type:
xr.Dataset
This function will take the unique points dataset and average over the gridpoints of the ERA5 coordinates.
- tcpips.ibtracs.calculate_potential_intensity()
Calculate the potential intensity of the cyclone at each timestep using the ERA5 data.
- tcpips.ibtracs.calculate_potential_size(chunks=20)
Calculate the size of the cyclone at each timestep using the ERA5 data.
- tcpips.ibtracs.calculate_potential_size_cat1(vmin=33, v_reduc=0.8)
Calculate potential size, assuming vmax@10m is 33 m/s, the minimum for a category 1 cyclone. Only calculate the potential size where the potential intensity is above this threshold.
- tcpips.ibtracs.check_sizes()
Loop through all of the IBTrACS datasets used as inputs and outputs and print their sizes.
- tcpips.ibtracs.compare_normalized_potential_size(lower_wind_vp=33, lower_wind_obs=33)
Compare normalized potential size.
- tcpips.ibtracs.create_exceedance_table(lower_wind_vp=33, lower_wind_obs=33)
Try and impose different filters and see how the exceedance changes.
- Return type:
- tcpips.ibtracs.create_normalized_variables(v_reduc=0.8)
Create normalized variables for rmax and vmax.
- tcpips.ibtracs.download_ibtracs_data()
Download IBTrACS data if not already downloaded
Currently gets the post 1980 data.
- Return type:
- tcpips.ibtracs.era5_unique_points_raw()
Get the data from the unique points in the ERA5 data, in order to be able to calculate potential intensity and size.
This function relies on os.path.join(IBTRACS_DATA_PATH, “unique_era5_points.nc”) already existing.
Need to do vectorized indexing, which in xarray only seems to work with sel rather than isel, hence we need to index by longitude, latitude, time.
- Return type:
- tcpips.ibtracs.filter_by_labels(ds, filter=[('basin', [b'NA']), ('subbasin', [b'GM']), ('nature', [b'TS']), ('usa_record', [b'L'])])
Filter by labels for IBTrACS.
- tcpips.ibtracs.get_exceedance_val(lower_wind_vp=33, lower_wind_obs=33, min_sst_c=None, max_abs_lat=None)
Get the exceedance values for the normalized variables.
- Parameters:
lower_wind_vp (float, optional) – Lower wind speed threshold for potential intensity. Defaults to 33 m/s.
lower_wind_obs (float, optional) – Lower wind speed threshold for observed wind speed. Defaults to 33 m/s.
min_sst_c (Optional[float], optional) – Minimum sea surface temperature in Celsius for filtering. Defaults to None.
max_abs_lat (Optional[float], optional) – Maximum absolute latitude for filtering. Defaults to None.
- Returns:
Dictionary containing the exceedance values for the normalized variables.
- Return type:
- tcpips.ibtracs.get_normalized_data(lower_wind_vp=33, lower_wind_obs=33, v_reduc=0.8, max_abs_lat=None, min_sst_c=None, emanuel_filter=None)
Get the normalized data from the IBTrACS dataset.
- Parameters:
lower_wind_vp (float, optional) – Lower wind speed threshold for potential intensity. Defaults to 33.0 m/s.
lower_wind_obs (float, optional) – Lower wind speed threshold for observed wind speed. Defaults to 33.0 m/s.
v_reduc (float, optional) – Reduction factor for the potential intensity. Defaults to 0.8.
max_abs_lat (float, optional) – Maximum absolute latitude for filtering. Defaults to None.
min_sst_c (float, optional) – Minimum sea surface temperature in Celsius for filtering. Defaults to None.
emanuel_filter (Optional[Literal["landfall_limited", "cold_water_limited", "unlimited"]], optional) – Defaults to None
- Returns:
The normalized dataset.
- Return type:
xr.Dataset
- tcpips.ibtracs.get_tc_ds(basin=b'NA', subbasin=b'GM', name=b'KATRINA')
Get a tropical cyclone from the IBTrACS dataset.
- Parameters:
- Returns:
Dataset containing the tropical cyclone data.
- Return type:
xr.Dataset
- tcpips.ibtracs.get_vary_vobs_lower_data(num=6, lower_vobs_min=18, lower_vobs_max=83, lower_wind_vp=33, regenerate=False)
Get the exceedance of normalized variables for varying lower limits of observed wind speed. :type num:
int:param num: Number of points to plot. Defaults to 6. :type num: int, optional :type lower_vobs_min:float:param lower_vobs_min: Minimum lower limit for observed wind speed. Defaults to 18 m/s. :type lower_vobs_min: float, optional :type lower_vobs_max:float:param lower_vobs_max: Maximum lower limit for observed wind speed. Defaults to 83 m/s. :type lower_vobs_max: float, optional :type lower_wind_vp:float:param lower_wind_vp: Lower limit for potential intensity wind speed. Defaults to 33 m/s. :type lower_wind_vp: float, optional :type regenerate: :param regenerate: If True, regenerate the dataset even if it exists. Defaults to False. :type regenerate: bool, optional- Returns:
Dataset containing the exceedance of normalized variables for varying lower limits of observed wind speed.
- Return type:
xr.Dataset
- tcpips.ibtracs.get_vary_vp_lower_data(num=6, lower_vp_min=33, lower_vp_max=83, lower_wind_obs=33)
Vary the lower limits for the normalized variables, and see how exceedance changes.
- Parameters:
num (int, optional) – Number of points to plot. Defaults to 6.
lower_vp_min (float, optional) – Minimum lower limit for potential intensity. Defaults to 33.
lower_vp_max (float, optional) – Maximum lower limit for potential intensity. Defaults to 83.
lower_wind_obs (float, optional) – Lower wind speed threshold for observed wind speed. Defaults to 33.
- Returns:
Dataset containing the exceedance of normalized variables for varying lower limits of potential intensity wind speed.
- Return type:
xr.Dataset
- tcpips.ibtracs.highlight_rapid_intensification(ax, times, wind_speeds, ri_threshold_knots=30, ri_period_hours=24, color='red', alpha=0.3, zorder=0)
Highlights periods of rapid intensification (RI) on a Matplotlib Axes object.
RI is defined as an increase in wind speed by at least ri_threshold_knots over a ri_period_hours duration. The highlighted period on the graph will be from (t - ri_period_hours) to t, where t is the time RI is achieved.
- Parameters:
ax (matplotlib.axes.Axes) – The Axes object to draw on.
times (array-like) – 1D array of time values (e.g., in hours). Must be sorted in ascending order.
wind_speeds (array-like) – 1D array of wind speeds (e.g., in m/s), corresponding to times.
ri_threshold_knots (float, optional) – The wind speed increase threshold for RI in knots. Defaults to 30 knots.
ri_period_hours (float, optional) – The time period over which the intensification is measured, in hours. Defaults to 24 hours.
color (str, optional) – Color for the highlighted regions. Defaults to ‘gold’.
alpha (float, optional) – Transparency for the highlighted regions. Defaults to 0.3.
zorder (int, optional) – The zorder for drawing the highlighted regions. Lower zorder is drawn first. Defaults to 0 (behind grid/plots).
- Return type:
- tcpips.ibtracs.ibtracs_to_era5_map()
Find the corresponding ERA5 data for each IBTrACS storm.
This function will create a unique counter for each unique (x, y, t) point in the ERA5 data that corresponds to an IBTrACS storm. The unique counter will be added to the IBTrACS data as a new variable. The unique points will be saved to a JSON file.
Currently takes around 36 seconds to run on laptop.
Parallelization could be implemented if we ignored having a unique counter for each point, however this is not the rate limting step.
TODO: Currently I only get ERA5 data up until the end of 2024, but IBTRACS data goes until mid 2025. This means that the last few storms will erroneously select december 2024 as the closest time point.
- tcpips.ibtracs.landings_only(ds)
Extract a reduced dataset based on those points at which a landing occurs.
- Parameters:
ds (xr.Dataset) – Individual storm.
- Returns:
- Clipped storm. If there are no tropical cyclones
hitting the coatline the coastline then None is returned.
- Return type:
Optional[xr.Dataset]
- tcpips.ibtracs.na_landing_tcs()
Get North Atlantic tropical storms that made landfall in the USA.
- Returns:
Filtered dataset.
- Return type:
xr.Dataset
- tcpips.ibtracs.plot_cps()
Plot the corresponding potential size, assuming the windspeed from usa_wind/V_reduc instead of the potential intensity.
- tcpips.ibtracs.plot_era5_processed()
Plot the processed ERA5 unique datapoints selected to include the key variables for calculating potential intensity and size.
- Return type:
- tcpips.ibtracs.plot_example_raw()
- tcpips.ibtracs.plot_normalized_cps()
Plot the normalized rmax for the CPS data.
- tcpips.ibtracs.plot_normalized_quad(lower_wind_vp=33, lower_wind_obs=33, min_sst_c=None, max_abs_lat=None, plot_storms=False)
Plot the normalized variables from the IBTrACS dataset:
- Parameters:
lower_wind_vp (float, optional) – Lower wind speed threshold for filtering. Defaults to 33.0 m/s.
lower_wind_obs (float, optional) – Lower wind speed threshold for observed wind speed filtering. Defaults to 33.0 m/s.
min_sst_c (Optional[float], optional) – Minimum sea surface temperature in Celsius for filtering. Defaults to None.
max_abs_lat (Optional[float], optional) – Maximum absolute latitude for filtering. Defaults to None.
plot_storms (bool, optional) – If True, plot the storms with their normalized variables. Defaults to False.
- Return type:
- tcpips.ibtracs.plot_normalized_quad_dual(lower_wind_vp=33, lower_wind_obs=33)
Plot the additional filtered data alongside the original data.
Refactored quad plot to make it cleaner.
- tcpips.ibtracs.plot_potential_intensity()
Provide example plots using the potential intensity data.
- tcpips.ibtracs.plot_tc_example(name=b'KATRINA', basin=b'NA', subbasin=b'GM', bbox=(-92.5, -72.5, 22.5, 37.5), landing_no=-1)
Plot an example of a TC’s track with the potential intensity and potential size, and corresponding potential size.
- Parameters:
name (bytes, optional) – Name of the tropical cyclone to plot. Defaults to b”KATRINA”.
basin (bytes, optional) – Basin of the tropical cyclone to plot. Defaults to b”NA”.
subbasin (bytes, optional) – Subbasin of the tropical cyclone to plot. Defaults to b”GM”.
bbox (Optional[Tuple[float, float, float, float]], optional) – Bounding box for the map. Defaults to (-92.5, -72.5, 22.5, 37.5).
landing_no (int, optional) – Index of the landing to plot. Defaults to -1 (last landing).
- Return type:
- tcpips.ibtracs.plot_tc_track_by_index(index, plot_name)
Plot a tropical cyclone track by index.
- tcpips.ibtracs.plot_unique_points()
- tcpips.ibtracs.plot_var_on_map(ax, da, label, cmap, shrink=1, **kwargs)
Plot a variable on a geographic map.
- tcpips.ibtracs.process_era5_raw()
Process the raw ERA5 unique datapoints selected to include the key variables for calculating potential intensity and size.
- Return type:
- tcpips.ibtracs.run_all_plots()
Re-run all the plot (not the calculations) in the correct order.
- tcpips.ibtracs.save_basin_names()
Save the tropical cyclones in each basin to a file.
- tcpips.ibtracs.select_tc_from_ds(ds, name=b'KATRINA', basin=b'NA', subbasin=b'GM')
Select a tropical cyclone from the dataset.
- Parameters:
- Returns:
Dataset containing only a particular tropical cyclone’s data.
- Return type:
xr.Dataset
- tcpips.ibtracs.vary_limits(num=6)
Vary the lower limits for the normalized variables, and see how exceedance changes.
- Parameters:
num (int, optional) – Number of points to plot. Defaults to 6.
- tcpips.ibtracs.vary_v_cps(v_reduc=0.8, name=b'KATRINA', basin=b'NA', subbasin=b'GM', timestep=30, steps=60, v33=33)
I want to vary the velocity input to the potential size calculation from the threshold for category 1 hurricane to the potential intensity, and plot the corresponding potential size in terms of rmax.
- Parameters:
v_reduc (float, optional) – Reduction factor for the velocity. Defaults to 0.8.
name (str, optional) – Name of the tropical cyclone to plot. Defaults to b”KATRINA”.
basin (str, optional) – Basin of the tropical cyclone to plot. Defaults to b”NA”.
subbasin (str, optional) – Subbasin of the tropical cyclone to plot. Defaults to b”GM”.
timestep (int, optional) – Timestep to select for the potential intensity and size calculation. Defaults to 20.
steps (int, optional) – Number of steps to vary the velocity. Defaults to 60.
v33 (float, optional) – Threshold for category 1 hurricane in m/s. Defaults to 33.
- Return type:
tcpips.pangeo module
Pangeo download and processing scripts.
This script is used to download CMIP6 data from the Pangeo Google cloud store and process it for use in the TCPIPS project to calculate potential size and potential intensity.
Folder structure:
${ROOT}/${processing-step}/${experiment}/${atm/oc}/${model}/${member_id}.nc
- tcpips.pangeo.combined_experiments_from_cat_subset(cat_subset, experiments, oc_or_at='atmos', **kwargs)
Combine experiments from a catalog subset.
- tcpips.pangeo.combined_experiments_from_dset_dict(dset_dict, experiments, oc_or_at='atmos', **kwargs)
Function to combine experiments together.
This is really misnamed and should be called something like save_experiments.
It is used to save the experiments to disk.
- tcpips.pangeo.get_data_pair(exp='ssp585', institution_id='THU', source_id='CIESM')
Get data from Pangeo for a particular institution source pair.
Have a look at the CMIP6 data available on Pangeo:
- tcpips.pangeo.get_data_part(experiments=['historical', 'ssp585'], institution_id='NCAR', table='Omon', oc_or_at='ocean', source_id='CESM2-SE')
Get data part from one of the experiments.
- Parameters:
- Return type:
tcpips.pi module
Potential Intensity Calculation script.
- tcpips.pi.calculate_pi(ds, dim='p', fix_temp=False, V_reduc=1.0)
Calculate the potential intensity using the tcpyPI package.
Data must have been converted to the tcpyPI units by `tcpips.convert’.
- Parameters:
ds (xr.Dataset) – xarray dataset containing the necessary variables.
dim (str, optional) – Vertical dimension. Defaults to “p” for pressure level.
fix_temp (bool, optional) – Whether to fix the temperature profile. Defaults to True.
V_reduc (float, optional) – Reduction factor for the wind speed. Defaults to 1.0.
- Returns:
xarray dataset containing the calculated variables.
- Return type:
xr.Dataset
Example
>>> ds = xr.Dataset(data_vars={"sst": (["x", "y"], [[20, 30], [30, 32]], ... {"units": "degrees_Celsius"}), ... "msl": (["x", "y"], [[1000, 1005], [1010, 1015]], ... {"units": "hPa"}), ... "t": (["x", "y", "p"], ... [[[np.nan, 23], [30, 21]], [[30, 21], [np.nan, 23]]], ... {"units": "degrees_Celsius"}), ... "q": (["x", "y", "p"], ... [[[10, 20], [30, 40]], [[50, 60], [70, 80]]], ... {"units": "g/kg"})}, ... coords={"x": (["x"], [-80, -85], {"units": "degrees_East"}), ... "y": (["y"], [20, 25], {"units": "degrees_North"}), ... "p": (["p"], [1000, 850], {"units": "hPa"})}) >>> pi_ds = calculate_pi(ds, fix_temp=True)
- tcpips.pi.fix_profile(ds, method='lapse_rate', max_temp_c=100.0)
Fills missing values in a vertical temperature and specific humdity profile using a principled method.
Methods: 1. ‘interpolate’: Fills internal NaNs using linear interpolation. Cannot fill
NaNs at the top or bottom of the profile.
‘lapse_rate’: Fills NaNs at the bottom of the profile by extrapolating downwards from the lowest valid data point using a dry adiabatic lapse rate. This is ideal for fixing missing surface-level pressure data.
‘well_mixed’: Fills NaNs at the bottom of the profile by assuming a well-mixed boundary layer. This method fills both temperature and specific humidity profiles consistently and is physically principled for tropical atmospheres.
- Parameters:
ds (xr.Dataset) – Xarray dataset containing temperature ‘t’ with a vertical pressure coordinate ‘p’.
method (str, optional) – The method to use: ‘interpolate’ or ‘lapse_rate’. Defaults to “lapse_rate”.
max_temp_c (float, optional) – Plausibility check. Temperatures above this value (in Celsius) are set to NaN. Defaults to 100.0.
- Returns:
Dataset with the temperature profile ‘t’ filled.
- Return type:
xr.Dataset
Example
>>> p_coords = [1000, 925, 850] >>> t_data = [[[np.nan, 290, 285], [305, 298, 292]], # Profile 1 has NaN at bottom ... [[np.nan, np.nan, 280], [302, 296, 290]]] # Profile 2 has two NaNs at bottom >>> ds = xr.Dataset( ... data_vars={ ... "t": (("x", "y", "p"), t_data, {"units": "K"}), ... }, ... coords={ ... "p": (("p",), p_coords, {"units": "hPa"}), ... "x": (("x",), [1, 2]), "y": (("y",), [1, 2]) ... }, ... ) >>> ds_fixed = fix_profile(ds, method='lapse_rate') >>> # For profile 1: T_ref=290K at p_ref=925hPa. T(1000) = 290 * (1000/925)**0.286 >>> expected_t1 = 290 * (1000/925)**KAPPA >>> # For profile 2: T_ref=280K at p_ref=850hPa. T(1000) = 280 * (1000/850)**0.286 etc. >>> expected_t2_925 = 280 * (925/850)**KAPPA >>> expected_t2_1000 = 280 * (1000/850)**KAPPA >>> np.testing.assert_almost_equal(ds_fixed.t.values[0, 0, 0], expected_t1, decimal=2) >>> np.testing.assert_almost_equal(ds_fixed.t.values[1, 0, 1], expected_t2_925, decimal=2) >>> np.testing.assert_almost_equal(ds_fixed.t.values[1, 0, 0], expected_t2_1000, decimal=2)
- tcpips.pi.reconstruct_profile_well_mixed(ds, temp_var='t', hum_var='q', p_coord='p', max_temp_c=100.0)
Reconstructs incomplete atmospheric profiles assuming a well-mixed boundary layer.
This function fills missing values (NaNs) at the bottom of vertical temperature and humidity profiles. It operates on the principle that the planetary boundary layer is well-mixed, meaning potential temperature (theta) and specific humidity (q) are constant with height in this layer.
The method proceeds as follows: 1. For each vertical profile, it identifies the lowest valid data point (the
reference level).
It assumes that potential temperature and specific humidity are constant from this reference level down to the surface (higher pressures).
It extrapolates these constant values downward to fill any NaNs.
The constant potential temperature is then converted back to in-situ temperature at each pressure level.
This approach is physically principled for climatological studies of the tropical atmosphere and ensures consistent treatment of both temperature and humidity, which is critical for accurate Potential Intensity (PI) calculations.
- Parameters:
ds (xr.Dataset) – Xarray dataset containing atmospheric profiles. Must include temperature, humidity, and a pressure coordinate.
temp_var (str) – The name of the temperature variable in the dataset (e.g., ‘t’, ‘ta’). Units must be Kelvin.
hum_var (str) – The name of the specific humidity variable (e.g., ‘q’, ‘hus’). Units should be kg/kg or dimensionless.
p_coord (str) – The name of the vertical pressure coordinate (e.g., ‘p’, ‘plev’). Units must be hPa or Pa.
max_temp_c (float) – A plausibility check. Temperatures above this value (in Celsius) are treated as missing data. Defaults to 100.0.
- Returns:
A new dataset with the temperature and humidity profiles filled.
- Return type:
xr.Dataset
tcpips.pi_driver module
Potential Intensity Driver for CMIP6 Data.
tcpips.plot module
tcpips.plot utilities.
- tcpips.plot.plot_features(plot_ds, features, units=None, names=None, vlim=None, super_titles=None)
A wrapper around the feature_grid function to plot the features of a dataset for the potential intensity inputs/outputs.
- Parameters:
plot_ds (xr.Dataset) – The dataset to plot data from.
features (List[List[str]]) – List of feature names to plot.
units (Optional[List[List[str]]], optional) – Units to plot. Defaults to None.
names (Optional[List[List[str]]], optional) – Names to plot. Defaults to None.
vlim (Optional[List[List[Tuple[str, float, float]]]], optional) – Colormap/vlim to plot. Defaults to None.
super_titles (Optional[List[str]], optional) – Supertitles to plot. Defaults to None.
- Returns:
The figure and axes of the plot.
- Return type:
Tuple[plt.Figure, List[List[plt.Axes]]]
tcpips.ps_driver module
A driver for calculating potential sizes using CMIP6 data.
There are two types of potential size we should calculate: (1) The potential size corresponding to the storm at its potential intensity, and (2) the potential size corresponding to the storm at the lower bound of 33 m/s for a category 1 tropical cyclone.
Each of these types of potential size requires a calculation with a different “vmax” input. They will produce a different output for rmax and r0 (the radius of maximum wind and the radius of vanishing winds, respectively).
Need to load Potential intensity data and regridded data.
tcpips.regrid module
tcpips.regrid_cdo module
Regrid CMIP6 data using CDO
This script provides functionality to regrid CMIP6 data using the Climate Data Operators (CDO) tool.
- tcpips.regrid_cdo.call_cdo(input_path, output_path)
Call cdo to regrid the input file to the output file.
This function uses the cdo command line tool to regrid the input file to the output file. It first prints the header of the input file, then deletes the unnecessary coordinates, and finally remaps the file bilinearly using cdo. The output file is saved in the specified output path. The temporary file created during the process is deleted after the process is completed.
tcpips.rh module
Relative Humidity Calculation Module.
This module provides functions to calculate relative humidity based on dew point temperature and air temperature. It includes a function to calculate the saturation pressure of water vapor at a given temperature.
https://earthscience.stackexchange.com/a/24167
- tcpips.rh.relative_humidity_from_dew_point(dew_point_temp, temperature)
Calculate the relative humidity given the dew point temperature and air temperature.
tcpips.run_dask_calculation module
tcpips.simple_sensitivity module
Work out the sensitivity of potential intensity to changes in temperature.
- tcpips.simple_sensitivity.simple_sensitivity(delta_t=1, k=0.07, m=1, avg_to_tropical_ocean=0.8, T_s0=300, T_00=200)
Simple sensitivity analysis for the change in potential intensity based on the change in temperature and humidity.
- Parameters:
delta_t (float) – Change in average global surface temperature in Kelvin.
k (float) – Scaling factor for humidity change based on CC (default 0.07).
m (float) – Scaling factor for the change in outflow temperature (default 1).
avg_to_tropical_ocean (float) – Average temperature change for tropical ocean compared to global temperature (default 0.8).
- Return type:
tcpips.xr_utils module
Some xarray utility functions.
- tcpips.xr_utils.propagate_attrs(ds_old, ds_new)
Propagate the standard_name and units attributes from one dataset to another.
- Parameters:
ds_old (xr.Dataset) – dataset with standard_name and units attributes.
ds_new (xr.Dataset) – dataset with standard_name and units attributes.
- Returns:
dataset with standard_name and units attributes.
- Return type:
xr.Dataset
- tcpips.xr_utils.propagate_wrapper(func)
Wrapper to propagate the standard_name and units attributes from one dataset to another.
- Parameters:
func (Callable[xr.Dataset, xr.Dataset]) – function to wrap.
- Returns:
wrapped function.
- Return type:
Callable[xr.Dataset, xr.Dataset]
- tcpips.xr_utils.standard_name_to_long_name(ds)
Turn the standard_name attribute into a long_name attribute.
- Parameters:
ds (xr.Dataset) – dataset with standard_name attributes.
- Returns:
dataset with long_name attributes.
- Return type:
xr.Dataset