GloH2O

MSWX

Multi-Source Weather

Overview

Mul­ti-Source Weath­er (MSWX) is an oper­a­tional, high-res­o­lu­tion, bias-cor­rect­ed mete­o­ro­log­i­cal prod­uct with glob­al cov­er­age from 1979 to 7 months from now.
Oth­er mete­o­ro­log­i­cal prod­ucts, such as ERA5, HydroGFD, PGF, and WFDE5, are not avail­able in near real-time, lack freely avail­able fore­casts, and have a coarse spa­tial res­o­lu­tion. MSWX com­bines the best data sources for each time-scale and elim­i­nates sys­tem­at­ic bias­es to pro­vide an effec­tive and read­i­ly avail­able solu­tion for use in oper­a­tional mod­el­ing appli­ca­tions. In addi­tion, MSWX is com­pat­i­ble with GloH2O’s Mul­ti-Source Weight­ed-Ensem­ble Pre­cip­i­ta­tion (MSWEP) prod­uct, which merges gauge, satel­lite, and reanaly­sis data to obtain the high­est qual­i­ty pre­cip­i­ta­tion estimates. 

Accurate

Based on state-of-the-art, high-res­o­lu­tion, bias-cor­rect­ed data sources

Consistent

Seam­less glob­al data record from 1979 to 7 months from now

Near real-time

Near real-time avail­abil­i­ty to sup­port time-crit­i­cal applications

Transparent

Meth­ods pub­lished in open-access peer-reviewed sci­en­tif­ic journal

Air tem­per­a­ture (°C)
Com­par­i­son between fore­cast­ed air tem­per­a­ture from MSWX-Long and ECMWF-SEAS5 on March 11, 2021, illus­trat­ing the enhanced detail and bias cor­rec­tion pro­vid­ed by MSWX. ECMWF-SEAS5 is gen­er­al­ly con­sid­ered the best sea­son­al fore­cast­ing sys­tem cur­rent­ly avail­able. Data rep­re­sent the first ensem­ble mem­ber of the fore­cast ini­tial­ized on Feb­ru­ary 1, 2021. 

Methodology

The fol­low­ing paper describes the MSWX method­ol­o­gy in detail: 
MSWX con­sists of four sub-prod­ucts rep­re­sent­ing the his­tor­i­cal record (MSWX-Past), the near real-time exten­sion (MSWX-NRT), the medi­um-range fore­cast ensem­ble (MSWX-Mid), and the sea­son­al fore­cast ensem­ble (MSWX-Long). The MSWX-NRT, ‑Mid, and ‑Long sub-prod­ucts are har­mo­nized with MSWX-Past to obtain a homo­ge­neous record from 1979 to 7 months from now. The fol­low­ing table pro­vides details on each sub-product. 
MSWX-Past MSWX-NRT MSWX-Mid MSWX-Long
Descrip­tion
His­tor­i­cal record
Near real-time extension
Medi­um-range forecasts
Long-range fore­casts
Tem­po­ral resolution
3‑hourly
3‑hourly
3‑hourly
Dai­ly
Spa­tial resolution
0.1°
0.1°
0.1°
0.1°
Laten­cy
~5 days
~3 hours
Update fre­quen­cy
Dai­ly
6‑hourly
Dai­ly (at ~08:00 UTC)
Month­ly (on the 13th)
Fore­cast horizon
10 days
7 months
Ini­tial­iza­tion
00:00 UTC
First of each month
Ensem­ble size
1
1
30
51
Data source(s)
ERA5, CHELSA
GDAS
GEFS
SEAS5
Real-time infor­ma­tion about the sta­tus of the MSWX pro­duc­tion sys­tem is avail­able here.

Data license

MSWX is released under the Cre­ative Com­mons Attri­bu­tion-Non­Com­mer­cial 4.0 Inter­na­tion­al (CC BY-NC 4.0) license. Please con­tact us if you are affil­i­at­ed with a com­mer­cial enti­ty and want to use MSWX. If you do not have a com­mer­cial affil­i­a­tion and you intend to use the prod­uct for non-com­mer­cial pur­pos­es, please send us a request using the fol­low­ing form. You will receive a link to the Google Dri­ve con­tain­ing MSWX once your request has been approved. 

Frequently asked questions

ERA5, HydroGFD, PGF, and WFDE5 are exam­ples of oth­er glob­al mete­o­ro­log­i­cal data prod­ucts. How­ev­er, these prod­ucts suf­fer from four main draw­backs com­pared to MSWX:

  1. They are not avail­able in near real-time and, there­fore, they can­not be used to oper­a­tional­ly mon­i­tor weath­er as it occurs. Although ERA5 and HydroGFD are updat­ed to 5 days from real-time, this is insuf­fi­cient­ly time­ly for most oper­a­tional appli­ca­tions. In addi­tion, the HydroGFD updates are only avail­able commercially.
  2. None of the oth­er prod­ucts pro­vide con­sis­tent and freely avail­able fore­casts and, there­fore, they can­not by them­selves be used to pro­vide advance warn­ing of impend­ing weath­er. As a workaround, sev­er­al oper­a­tional flood fore­cast­ing sys­tems com­bine his­tor­i­cal, near real-time, and fore­cast data from dif­fer­ent incon­sis­tent sources, which affects the reli­a­bil­i­ty of the warn­ings issued by these sys­tems. Although ECMWF fore­casts are, to a cer­tain degree, con­sis­tent with ERA5, they are only avail­able commercially.
  3. The oth­er prod­ucts have coarse spa­tial res­o­lu­tions (≥0.25°) and thus are unable to rep­re­sent moun­tain­ous regions. This is con­cern­ing as moun­tain­ous regions con­tribute a large share of the world’s pop­u­la­tion with freshwater.
  4. None of the oth­er prod­ucts incor­po­rate satel­lite-based pre­cip­i­ta­tion retrievals to enhance the per­for­mance in con­vec­tion-dom­i­nat­ed regions and peri­ods. Admit­ted­ly, nei­ther does MSWX, but MSWX is com­pat­i­ble with GloH2O’s Mul­ti-Source Weight­ed-Ensem­ble Pre­cip­i­ta­tion (MSWEP), which merges gauge, satel­lite, and reanaly­sis data to pro­vide the high­est qual­i­ty pre­cip­i­ta­tion estimates.
There are sev­er­al com­mer­cial mete­o­ro­log­i­cal data prod­ucts on the mar­ket from com­pa­nies like Cus­tomWeath­er, Oiko­Lab, Spire, Storm Glass, Vaisala, the Weath­er Com­pa­ny, and Weath­er Source. Their web­sites and brochures are gen­er­al­ly filled with fan­cy buzz­words (e.g., “hyper-local”), astro­nom­i­cal num­bers (e.g., “2 mil­lion grid points”), and con­fus­ing ter­mi­nol­o­gy (e.g., “vir­tu­al weath­er sta­tion”) intend­ed to per­suade you to buy their prod­ucts. In real­i­ty, how­ev­er, their prod­ucts suf­fer from a num­ber of seri­ous issues: 
  1. Mate­r­i­al prod­uct details (e.g., spa­tio-tem­po­ral res­o­lu­tion, tem­po­ral span, and data sources) are usu­al­ly not dis­closed, mak­ing it dif­fi­cult to find out what you’re real­ly getting.
  2. Many com­mer­cial prod­ucts rep­re­sent noth­ing more than repack­aged ver­sions of freely avail­able datasets (pri­mar­i­ly from NOAA, NASA, and ECMWF). While these com­mer­cial prod­ucts may be more read­i­ly acces­si­ble, they have the exact same char­ac­ter­is­tics as their free counterparts.
  3. Com­mer­cial prod­ucts often have a coarse spa­tial res­o­lu­tion (≥0.25°) and are, there­fore, unable to rep­re­sent moun­tain­ous regions. Weath­er­Source’s OnPoint Cli­ma­tol­ogy, for exam­ple, has an effec­tive res­o­lu­tion of just 0.5° (equiv­a­lent to approx­i­mate­ly 50 km at the equa­tor, esti­mat­ed from maps in their brochure, although they claim the res­o­lu­tion is 5 km).
  4. Com­mer­cial prod­ucts often com­bine dif­fer­ent incon­sis­tent data sources with­out per­form­ing any sta­tis­ti­cal har­mo­niza­tion, result­ing in spu­ri­ous jumps in the time series. Oiko­Lab, for exam­ple, com­bines ERA5 his­tor­i­cal data with GFS fore­casts, despite the sub­stan­tial sys­tem­at­ic dif­fer­ences between these two products.
  5. The algo­rithms under­ly­ing com­mer­cial prod­ucts are always kept secret, per­haps because the algo­rithms lack sophis­ti­ca­tion or because the com­pa­nies fear los­ing their edge. Either way, buy­ers are expect­ed to sim­ply take them at their word that their algo­rithms are tru­ly state-of-the-art.
  6. Com­mer­cial fore­cast prod­ucts are often deter­min­is­tic (one sin­gle future sce­nario) as opposed to prob­a­bilis­tic (mul­ti­ple future sce­nar­ios). Deter­min­is­tic fore­casts can give users a false sense of con­fi­dence and pre­cludes quan­tifi­ca­tion of the fore­cast uncer­tain­ty, which is essen­tial to make well-informed decisions.
  7. Val­i­da­tion results are rarely avail­able for com­mer­cial prod­ucts, pre­sum­ably because no val­i­da­tion was per­formed or because the prod­uct per­formed unfavorably.
  8. Com­mer­cial prod­ucts are not avail­able for free for non-com­mer­cial pur­pos­es, per­haps because turn­ing a prof­it is the only pri­or­i­ty or to avoid unfa­vor­able val­i­da­tion outcomes.

‌Down­load­ing data from shared Google Dri­ve fold­ers is rel­a­tive­ly easy using cloud store man­ag­er soft­ware such as rclone. The fol­low­ing instruc­tions explain how to set up rclone and down­load MSWX:

  1. Down­load and install rclone.
  2. Link rclone to your Google account by fol­low­ing the steps in this video.
  3. Access the MSWX shared fold­er by vis­it­ing it via your brows­er. Check if the shared fold­er is list­ed under “Shared with me” on your Google Dri­ve page.
  4. Con­firm that rclone can find the shared fold­er:
    $ rclone lsd --drive-shared-with-me GoogleDrive:
              -1 2021-02-03 10:14:35        -1 MSWX_V100
  5. Pre­pare a fil­ter file (filter-file.txt) that spec­i­fies the type of data we want to down­load. In this exam­ple, we down­load month­ly mean air tem­per­a­ture data from the his­toric, near real-time, and sea­son­al fore­cast sub-prod­ucts. The medi­um-range fore­casts do not have month­ly data and thus are not down­loaded.
    + /Past/Temp/Monthly/*.nc
    + /NRT/Temp/Monthly/*.nc
    + /S2S/Temp/**/Monthly/*.nc
    - /Fcst/**
    - *
    
  6. Exe­cute rclone. Remove the --dry-run argu­ment once you are sure the right data get down­loaded.
    $ rclone sync -v --filter-from filter-file.txt --drive-shared-with-me GoogleDrive:/MSWX_V100 c:/Temp/MSWX_V100 --dry-run
    2021/02/21 07:42:15 NOTICE: S2S/Temp/20201201_00/49/Monthly/202101.nc: Skipped copy as --dry-run is set
    2021/02/21 07:42:15 NOTICE: S2S/Temp/20201201_00/49/Monthly/202103.nc: Skipped copy as --dry-run is set
    2021/02/21 07:42:15 NOTICE: S2S/Temp/20201201_00/49/Monthly/202102.nc: Skipped copy as --dry-run is set
    ...

    If the down­load is inter­rupt­ed, the com­mand can be run again, and files that already exist will be skipped.

MSWX-NRT air tem­per­a­ture data for a par­tic­u­lar day (e.g., Feb­ru­ary 28, 2021) can be down­loaded using rclone as fol­lows:
rclone sync -v --drive-shared-with-me GoogleDrive:/MSWX_V100/NRT/Temp/Daily/2021059.nc ./
The data are read and plot using MATLAB as fol­lows:
global_temp = ncread('2021059.nc','air_temperature')';
imagesc(global_temp,[-50 40]);
colorbar
title('Air temperature on February 28, 2021 (°C)')
The same data are read and plot using Python as fol­lows:
from netCDF4 import Dataset
import matplotlib.pyplot as plt

dataset = Dataset('2021059.nc','r')
global_temp = dataset.variables['air_temperature'][:]
dataset.close()

plt.plot(global_temp,vmin=-50,vmax=40)
plt.colorbar()
plt.title("Air temperature on February 28, 2021 (°C)")
plt.show()

MSWX-Past rep­re­sents the his­tor­i­cal record start­ing in 1979 and end­ing ~5 days from real-time (updat­ed dai­ly). MSWX-NRT rep­re­sents the near real-time exten­sion to ~3 hours from real-time (updat­ed every 6 hours). MSWX-Mid rep­re­sents the 30-mem­ber medi­um-range (up to 10 days) fore­cast ensem­ble (updat­ed dai­ly). MSWX-Long rep­re­sents the 51-mem­ber long-range (up to 7 months) fore­cast ensem­ble (updat­ed month­ly). See the MSWX paper for more details.

MSWX-Past and MSWEP are both based on bias-cor­rect­ed and down­scaled atmos­pher­ic mod­el out­put, but MSWEP addi­tion­al­ly incor­po­rates gauge and satel­lite data. Pre­cip­i­ta­tion esti­mates from MSWEP are thus like­ly more accu­rate than those from MSWX-Past in (a) dense­ly gauged regions and (b) con­vec­tion-dom­i­nat­ed regions due to the satel­lite data.

Note that the two prod­ucts are large­ly com­pat­i­ble, as MSWEP inher­its the ref­er­ence CDF from MSWX-Past. MSWX-NRT, ‑Fcst, ‑S2S can thus be used to extend MSWEP to the present and into the future. Minor incon­sis­ten­cies may, how­ev­er, be present due to the inclu­sion of satel­lite and gauge data in MSWEP. 

The choice between pre­cip­i­ta­tion from MSWX-Past and MSWEP depends on the appli­ca­tion and involves a trade­off between accu­ra­cy and con­sis­ten­cy. In gen­er­al, MSWX-Past is less accu­rate but more con­sis­tent with MSWX-NRT, ‑Fcst, and ‑S2S, where­as MSWEP is more accu­rate but poten­tial­ly less con­sis­tent with MSWX-NRT, ‑Fcst, and ‑S2S (due to the inclu­sion of the satel­lite and gauge data). For cal­i­brat­ing a hydro­log­i­cal mod­el using his­tor­i­cal data, for exam­ple, we would rec­om­mend using MSWEP. How­ev­er, for plac­ing MSWX-Fcst data in a his­tor­i­cal per­spec­tive, we would rec­om­mend MSWX-Past.

LL