Multi-Source Weather


Mul¬≠ti-Source Weath¬≠er (MSWX) is an oper¬≠a¬≠tional, high-res¬≠o¬≠lu¬≠tion (3‚ÄĎhourly 0.1¬į), bias-cor¬≠rect¬≠ed mete¬≠o¬≠ro¬≠log¬≠i¬≠cal prod¬≠uct with glob¬≠al cov¬≠er¬≠age from 1979 to 7 months from now.
Oth¬≠er mete¬≠o¬≠ro¬≠log¬≠i¬≠cal prod¬≠ucts, such as ERA5, HydroGFD, PGF, and WFDE5, are not avail¬≠able in near real-time, lack freely avail¬≠able fore¬≠casts, and have a coarse spa¬≠tial res¬≠o¬≠lu¬≠tion (‚Č•0.25¬į). MSWX com¬≠bines the best data sources for each time-scale and elim¬≠i¬≠nates sys¬≠tem¬≠at¬≠ic bias¬≠es to pro¬≠vide an effec¬≠tive and read¬≠i¬≠ly avail¬≠able solu¬≠tion for use in oper¬≠a¬≠tional mod¬≠el¬≠ing appli¬≠ca¬≠tions. In addi¬≠tion, MSWX is com¬≠pat¬≠i¬≠ble with GloH2O‚Äôs Mul¬≠ti-Source Weight¬≠ed-Ensem¬≠ble Pre¬≠cip¬≠i¬≠ta¬≠tion (MSWEP) prod¬≠uct, which merges gauge, satel¬≠lite, and reanaly¬≠sis data to obtain the high¬≠est qual¬≠i¬≠ty pre¬≠cip¬≠i¬≠ta¬≠tion estimates. 


Based on state-of-the-art, high-res¬≠o¬≠lu¬≠tion, bias-cor¬≠rect¬≠ed data sources 


Seam¬≠less glob¬≠al data record from 1979 to 7 months from now 

Near real-time 

Near real-time avail¬≠abil¬≠i¬≠ty to sup¬≠port time-crit¬≠i¬≠cal applications 


Meth¬≠ods pub¬≠lished in open-access peer-reviewed sci¬≠en¬≠tif¬≠ic journal 

Air tem¬≠per¬≠a¬≠ture (¬įC)
Com¬≠par¬≠i¬≠son between air tem¬≠per¬≠a¬≠ture fore¬≠casts for Europe from MSWX-Long and ECMWF-SEAS5 on March 11, 2021, illus¬≠trat¬≠ing the enhanced detail and bias cor¬≠rec¬≠tion pro¬≠vid¬≠ed by MSWX. ECMWF-SEAS5 is gen¬≠er¬≠al¬≠ly con¬≠sid¬≠ered the best sea¬≠son¬≠al fore¬≠cast¬≠ing sys¬≠tem cur¬≠rent¬≠ly avail¬≠able. Data rep¬≠re¬≠sent the first ensem¬≠ble mem¬≠ber of the fore¬≠cast ini¬≠tial¬≠ized on Feb¬≠ru¬≠ary 1, 2021. 


The fol¬≠low¬≠ing paper describes the MSWX method¬≠ol¬≠o¬≠gy in detail: 
MSWX con¬≠sists of four sub-prod¬≠ucts rep¬≠re¬≠sent¬≠ing the his¬≠tor¬≠i¬≠cal record (MSWX-Past), the near real-time exten¬≠sion (MSWX-NRT), the medi¬≠um-range fore¬≠cast ensem¬≠ble (MSWX-Mid), and the sea¬≠son¬≠al fore¬≠cast ensem¬≠ble (MSWX-Long). The MSWX-NRT, ‚ÄĎMid, and ‚ÄĎLong sub-prod¬≠ucts are har¬≠mo¬≠nized with MSWX-Past to obtain a homo¬≠ge¬≠neous record from 1979 to 7 months from now. The fol¬≠low¬≠ing table pro¬≠vides details on each sub-product. 
His­tor­i­cal record
Near real-time extension
Medi­um-range forecasts
Long-range fore­casts
Tem­po­ral resolution
Spa­tial resolution
~5 days
1.5 to 4.5 hours
Update fre­quen­cy
Dai¬≠ly (06:30‚Äď10:00 UTC)
Month¬≠ly (on the 13th)
Fore­cast horizon
10 days
7 months
00:00 UTC
First of each month
Ensem¬≠ble size
Data source(s)
Real-time infor­ma­tion about the sta­tus of the MSWX pro­duc­tion sys­tem is avail­able here. An inter­ac­tive web view­er of month­ly MSWX-Past data for a selec­tion of vari­ables is avail­able here.

Data license

MSWX is released under the Cre¬≠ative Com¬≠mons Attri¬≠bu¬≠tion-Non¬≠Com¬≠mer¬≠cial 4.0 Inter¬≠na¬≠tion¬≠al (CC BY-NC 4.0) license. Please con¬≠tact us if you are affil¬≠i¬≠at¬≠ed with a com¬≠mer¬≠cial enti¬≠ty and want to use MSWX. If you do not have a com¬≠mer¬≠cial affil¬≠i¬≠a¬≠tion and you intend to use the prod¬≠uct for non-com¬≠mer¬≠cial pur¬≠pos¬≠es, please send us a request using the fol¬≠low¬≠ing form. You will receive a link to the Google Dri¬≠ve con¬≠tain¬≠ing MSWX once your request has been approved. 

Frequently asked questions

ERA5, HydroGFD, PGF, and WFDE5 are exam¬≠ples of oth¬≠er glob¬≠al mete¬≠o¬≠ro¬≠log¬≠i¬≠cal data prod¬≠ucts. How¬≠ev¬≠er, these prod¬≠ucts suf¬≠fer from four main draw¬≠backs com¬≠pared to MSWX:

  1. They are not avail­able in near real-time and, there­fore, they can­not be used to oper­a­tional­ly mon­i­tor weath­er as it occurs. Although ERA5 and HydroGFD are updat­ed to 5 days from real-time, this is insuf­fi­cient­ly time­ly for most oper­a­tional appli­ca­tions. In addi­tion, the HydroGFD updates are only avail­able commercially.
  2. None of the oth­er prod­ucts pro­vide con­sis­tent and freely avail­able fore­casts and, there­fore, they can­not by them­selves be used to pro­vide advance warn­ing of impend­ing weath­er. As a workaround, sev­er­al oper­a­tional flood fore­cast­ing sys­tems com­bine his­tor­i­cal, near real-time, and fore­cast data from dif­fer­ent incon­sis­tent sources, which affects the reli­a­bil­i­ty of the warn­ings issued by these sys­tems. Although ECMWF fore­casts are, to a cer­tain degree, con­sis­tent with ERA5, they are only avail­able commercially.
  3. The oth¬≠er prod¬≠ucts have coarse spa¬≠tial res¬≠o¬≠lu¬≠tions (‚Č•0.25¬į) and thus are unable to rep¬≠re¬≠sent moun¬≠tain¬≠ous regions. This is con¬≠cern¬≠ing as moun¬≠tain¬≠ous regions con¬≠tribute a large share of the world‚Äôs pop¬≠u¬≠la¬≠tion with freshwater.
  4. None of the oth­er prod­ucts incor­po­rate satel­lite-based pre­cip­i­ta­tion retrievals to enhance the per­for­mance in con­vec­tion-dom­i­nat­ed regions and peri­ods. Admit­ted­ly, nei­ther does MSWX, but MSWX is com­pat­i­ble with GloH2O’s Mul­ti-Source Weight­ed-Ensem­ble Pre­cip­i­ta­tion (MSWEP), which merges gauge, satel­lite, and reanaly­sis data to pro­vide the high­est qual­i­ty pre­cip­i­ta­tion estimates.

There are sev¬≠er¬≠al com¬≠mer¬≠cial mete¬≠o¬≠ro¬≠log¬≠i¬≠cal data prod¬≠ucts on the mar¬≠ket from com¬≠pa¬≠nies like Cus¬≠tomWeath¬≠er, Oiko¬≠Lab, Spire, Storm Glass, Vaisala, the Weath¬≠er Com¬≠pa¬≠ny, and Weath¬≠er Source. Their web¬≠sites and brochures are gen¬≠er¬≠al¬≠ly filled with fan¬≠cy buzz¬≠words (e.g., ‚Äúhyper-local‚ÄĚ), astro¬≠nom¬≠i¬≠cal num¬≠bers (e.g., ‚Äú2 mil¬≠lion grid points‚ÄĚ), and con¬≠fus¬≠ing ter¬≠mi¬≠nol¬≠o¬≠gy (e.g., ‚Äúvir¬≠tu¬≠al weath¬≠er sta¬≠tion‚ÄĚ) intend¬≠ed to per¬≠suade you to buy their prod¬≠ucts. In real¬≠i¬≠ty, how¬≠ev¬≠er, their prod¬≠ucts suf¬≠fer from a num¬≠ber of issues:

  1. Mate­r­i­al prod­uct details (e.g., spa­tio-tem­po­ral res­o­lu­tion, tem­po­ral span, and data sources) are usu­al­ly not dis­closed, mak­ing it dif­fi­cult to find out what you’re real­ly getting.
  2. Many com­mer­cial prod­ucts rep­re­sent noth­ing more than repack­aged ver­sions of freely avail­able datasets (pri­mar­i­ly from NOAA, NASA, and ECMWF). While these repack­aged com­mer­cial ver­sions may be more read­i­ly acces­si­ble, the data are iden­ti­cal to their free counterparts.
  3. Com¬≠mer¬≠cial prod¬≠ucts are often adver¬≠tised as ‚Äúhigh res¬≠o¬≠lu¬≠tion‚ÄĚ but in real¬≠i¬≠ty have a coarse spa¬≠tial res¬≠o¬≠lu¬≠tion (‚Č•0.25¬į) and are, there¬≠fore, unable to rep¬≠re¬≠sent moun¬≠tain¬≠ous regions.
  4. Com­mer­cial prod­ucts often com­bine dif­fer­ent incon­sis­tent data sources with­out per­form­ing any sta­tis­ti­cal har­mo­niza­tion, result­ing in spu­ri­ous jumps in the time series.
  5. Any algo¬≠rithms under¬≠ly¬≠ing com¬≠mer¬≠cial prod¬≠ucts are always kept secret. Buy¬≠ers are expect¬≠ed to sim¬≠ply take them at their word that their algo¬≠rithms pro¬≠vide added value.
  6. Com­mer­cial fore­cast prod­ucts are often deter­min­is­tic (one sin­gle future pre­dic­tion) as opposed to prob­a­bilis­tic (mul­ti­ple future pre­dic­tions). Deter­min­is­tic fore­casts can give users a false sense of con­fi­dence and pre­cludes quan­tifi­ca­tion of the fore­cast uncer­tain­ty, which is essen­tial to make well-informed decisions.
  7. Val­i­da­tion results are rarely avail­able for com­mer­cial prod­ucts, pre­sum­ably because no val­i­da­tion was per­formed or because the prod­uct per­formed unfavorably.
  8. Com­mer­cial prod­ucts are rarely avail­able for free for non-com­mer­cial pur­pos­es. Help­ing advance sci­ence or sup­port­ing non-prof­its has a low pri­or­i­ty for these com­pa­nies. They may also want to avoid unfa­vor­able val­i­da­tion outcomes.

‚ÄĆDown¬≠load¬≠ing data from shared Google Dri¬≠ve fold¬≠ers is rel¬≠a¬≠tive¬≠ly easy using cloud store man¬≠ag¬≠er soft¬≠ware such as rclone. The fol¬≠low¬≠ing instruc¬≠tions explain how to set up rclone and down¬≠load MSWX:

  1. Down­load and install rclone.
  2. Link rclone to your Google account by fol­low­ing the steps in this video.
  3. Access the MSWX shared fold¬≠er by vis¬≠it¬≠ing it via your brows¬≠er. Check if the shared fold¬≠er is list¬≠ed under ‚ÄúShared with me‚ÄĚ on your Google Dri¬≠ve page.
  4. Con­firm that rclone can find the shared fold­er:
    $ rclone lsd --drive-shared-with-me GoogleDrive:
              -1 2021-02-03 10:14:35        -1 MSWX_V100
  5. Pre­pare a fil­ter file (filter-file.txt) that spec­i­fies the type of data we want to down­load. In this exam­ple, we down­load month­ly mean air tem­per­a­ture data from the his­toric, near real-time, and sea­son­al fore­cast sub-prod­ucts. The medi­um-range fore­casts do not have month­ly data and thus are not down­loaded.
    + /Past/Temp/Monthly/*.nc
    + /NRT/Temp/Monthly/*.nc
    + /S2S/Temp/**/Monthly/*.nc
    - /Fcst/**
    - *
  6. Exe­cute rclone. Remove the --dry-run argu­ment once you are sure the right data get down­loaded.
    $ rclone sync -v --filter-from filter-file.txt --drive-shared-with-me GoogleDrive:/MSWX_V100 c:/Temp/MSWX_V100 --dry-run
    2021/02/21 07:42:15 NOTICE: S2S/Temp/20201201_00/49/Monthly/ Skipped copy as --dry-run is set
    2021/02/21 07:42:15 NOTICE: S2S/Temp/20201201_00/49/Monthly/ Skipped copy as --dry-run is set
    2021/02/21 07:42:15 NOTICE: S2S/Temp/20201201_00/49/Monthly/ Skipped copy as --dry-run is set

    If the down­load is inter­rupt­ed, the com­mand can be run again, and files that already exist will be skipped.

MSWX-NRT air tem­per­a­ture data for a par­tic­u­lar day (e.g., Feb­ru­ary 28, 2021) can be down­loaded using rclone as fol­lows:
rclone sync -v --drive-shared-with-me GoogleDrive:/MSWX_V100/NRT/Temp/Daily/ ./
The data are read and plot using MATLAB as fol­lows:
global_temp = ncread('','air_temperature')';
imagesc(global_temp,[-50 40]);
title('Air temperature on February 28, 2021 (¬įC)')
The same data are read and plot using Python as fol­lows:
from netCDF4 import Dataset
import matplotlib.pyplot as plt

dataset = Dataset('','r')
global_temp = dataset.variables['air_temperature'][:]

plt.title("Air temperature on February 28, 2021 (¬įC)")

MSWX-Past rep­re­sents the his­tor­i­cal record start­ing in 1979 and end­ing ~5 days from real-time (updat­ed dai­ly). MSWX-NRT rep­re­sents the near real-time exten­sion to ~3 hours from real-time (updat­ed every 6 hours). MSWX-Mid rep­re­sents the 30-mem­ber medi­um-range (up to 10 days) fore­cast ensem­ble (updat­ed dai­ly). MSWX-Long rep­re­sents the 51-mem­ber long-range (up to 7 months) fore­cast ensem­ble (updat­ed month­ly). See the MSWX paper for more details.

MSWX-Past and MSWEP are both based on bias-cor¬≠rect¬≠ed and down¬≠scaled atmos¬≠pher¬≠ic mod¬≠el out¬≠put, but MSWEP addi¬≠tion¬≠al¬≠ly incor¬≠po¬≠rates gauge and satel¬≠lite data. Pre¬≠cip¬≠i¬≠ta¬≠tion esti¬≠mates from MSWEP are thus like¬≠ly more accu¬≠rate than those from MSWX-Past in (a) dense¬≠ly gauged regions and (b) con¬≠vec¬≠tion-dom¬≠i¬≠nat¬≠ed regions due to the satel¬≠lite data.

Note that the two prod¬≠ucts are large¬≠ly com¬≠pat¬≠i¬≠ble, as MSWEP inher¬≠its the ref¬≠er¬≠ence CDF from MSWX-Past. MSWX-NRT, ‚ÄĎFcst, ‚ÄĎS2S can thus be used to extend MSWEP to the present and into the future. Minor incon¬≠sis¬≠ten¬≠cies may, how¬≠ev¬≠er, be present due to the inclu¬≠sion of satel¬≠lite and gauge data in MSWEP. 

The choice between pre¬≠cip¬≠i¬≠ta¬≠tion from MSWX-Past and MSWEP depends on the appli¬≠ca¬≠tion and involves a trade¬≠off between accu¬≠ra¬≠cy and con¬≠sis¬≠ten¬≠cy. In gen¬≠er¬≠al, MSWX-Past is less accu¬≠rate but more con¬≠sis¬≠tent with MSWX-Mid and ‚ÄĎLong, where¬≠as MSWEP may be more accu¬≠rate but less con¬≠sis¬≠tent with MSWX-Mid and ‚ÄĎLong (due to the inclu¬≠sion of the satel¬≠lite and gauge data). For cal¬≠i¬≠brat¬≠ing a hydro¬≠log¬≠i¬≠cal mod¬≠el using his¬≠tor¬≠i¬≠cal data, for exam¬≠ple, we would rec¬≠om¬≠mend using MSWEP. How¬≠ev¬≠er, for plac¬≠ing MSWX-Mid data in a his¬≠tor¬≠i¬≠cal per¬≠spec¬≠tive, we would rec¬≠om¬≠mend using MSWX-Past.

WX is Morse short­hand code for the word weather.