<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><title>frankie-tales</title><id>https://lovergine.com/feeds/tags/science.xml</id><subtitle>Tag: science</subtitle><updated>2026-02-25T15:33:03Z</updated><link href="https://lovergine.com/feeds/tags/science.xml" rel="self" /><link href="https://lovergine.com" /><entry><title>SM-Tools, Copernicus, FOSS and the reasons of inevitable choices and drifts</title><id>https://lovergine.com/sm-tools-copernicus-foss-and-the-reasons-of-inevitable-choices-and-drifts.html</id><author><name>Francesco P. Lovergine</name><email>mbox@lovergine.com</email></author><updated>2026-02-25T15:00:00Z</updated><link href="https://lovergine.com/sm-tools-copernicus-foss-and-the-reasons-of-inevitable-choices-and-drifts.html" rel="alternate" /><content type="html">&lt;p&gt;Here at work, we develop a series of tools for geospatial processing with
multiple goals, expected maintenance durations, different scopes, generalization
needs, and motivations. Note that about 10 years ago, here in Europe, a
completely new approach to upstream and downstream services for Earth
Observation began: ESA changed its data licenses, and distribution and access
modalities entered the Big Data era.&lt;/p&gt;&lt;p&gt;That even changed the data approach for academic institutions and triggered a
major shift in the daily work of researchers, with access to a tremendous volume
of weekly data available almost just in time and worldwide. Of course, such a
change also impacted us, and we had to adapt our processing and storage
capabilities to the new era.&lt;/p&gt;&lt;p&gt;One of my side projects in that regard is
&lt;a href=&quot;https://baltig.cnr.it/francesco.lovergine/sm-tools&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;SM-Tools&lt;/a&gt;, which consists
primarily of a collection of support tools for running our internally developed
soil moisture algorithm using SAR satellite data (&lt;a href=&quot;https://sarwater.irea.cnr.it/smosar.html&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;SMOSAR&lt;/a&gt;).
One such tool (now named &lt;code&gt;smt_copernicus&lt;/code&gt;) began
(and evolved with multiple restarts-from-scratch) more than 10 years ago, when
the Sentinel constellation started operations. Its purpose is to search for and
download satellite products from Copernicus archives using multiple criteria,
and to maintain an internal geospatial database of these products, along with
all derived maps and ancillary data. This is only one component of a system that
should be able to process large quantities of multi-source data on selected
areas of interest, create downstream products, and calibrate and analyze results
by comparing them with field data. The clear final goal is to achieve new
findings in satellite data analysis, supported by extended processing worldwide,
and to introduce new algorithms.&lt;/p&gt;&lt;p&gt;This is a long-term goal that unfortunately runs up against short- to mid-term
difficulties of accessing archives that are not under our direct control. The
sad reality is that in the last 4 years,  the Copernicus archive access modality
changed 3 times, and in the previous period, Copernicus also changed policies
and modalities in progress (e.g., by introducing online and offline products,
changing formats, etc.). Geospatial communities are small enough to encounter
more practical difficulties than expected in such operational conditions, and
this is now an almost weekly experience. We now have to chase other parties’
changes more often than we did in the past, rather than working on our own goals
instead.&lt;/p&gt;&lt;p&gt;For instance, until 2023, the main package used for accessing the Copernicus
archive was the &lt;a href=&quot;https://github.com/sentinelsat/sentinelsat&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Sentinelsat&lt;/a&gt;
Python package (developed by an handfull of willing
scholars starting in the summer of 2015). It became abandonware that year when
people discovered that the protocol changes required rewriting most of the
package from scratch, including all test code and mockings. That happened again
at the end of last year. Incredibly enough, all access protocols have been an
ongoing affair since 2023, even for non-secondary details, and required frequent
adjustments to avoid unexpected breakage in download and processing pipelines.
That for sure does not encourage community-supported FOSS solutions. Exactly in
2023, when I discovered our Sentinel-related tool had to be deeply changed
because of the lack of the obsoleted Sentinelsat package, I decided that enough
was enough and managed to migrate from Python to a fully self-supported Perl
reimplementation: one of the things I always hated in the Python ecosystem is
the excessive (for me and our purposes at least) speed in deprecation of
consolidated packages and features, and the prospective of having to chase after
unexpected changes in both Copernicus AND Python sides was out of question. My
experiences with Perl have been much less annoying in this regard, with scripts
running perfectly even after 20 years since they were written. Let’s consider
this the old-school approach: if something is working, don't touch it without a
more than valid reason, and even then, think twice before you touch.&lt;/p&gt;&lt;p&gt;In the meantime, around 2019, another independent effort started to support
Copernicus access, along with a few other data providers. That’s about  4 years
after the original Sentinelsat project. Timing in this case is essential to be
considered. &lt;a href=&quot;https://github.com/CS-SI/eodag&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;EODag&lt;/a&gt;
is a single-company FOSS product that has been actively
developed since then, but may have been considered stable 1-2 years later. Of
course, it only provides the usual access layer, and using that package implied,
at the time, replacing Sentinelsat with EODag as a base layer for searching and
downloading only, while performing other tasks afterward with other self-made
code. Before 2023, using Sentinelsat or EOdag was equivalent in order to perform
the same task, with very little advantages.&lt;/p&gt;&lt;p&gt;Note that both tools were, in any case, pretty adequate but not enough for our
goals, and also had a few defects (or a lack of flexibility, to say better) to
manage in some creative way. That has been one of the reasons for not replacing
Sentinelsat with EOdag in 2023. The other major one was that the idea of
replacing a small package with another (as for Sentinelsat, there are just a
couple of main contributors to the codebase, along with a good number of
pending issues and PRs for such a kind of product) is probably not the safest
one to avoid problems in the near future, when Copernicus will change things
(again, see the next Earth Observation Processing Framework EOPF data format —
Zarr). And of course, EODag is written in Python, and I already expressed my
concerns about that.&lt;/p&gt;&lt;p&gt;Even if you like it or not, nowadays, the concrete alternative to adopting small
FOSS projects to perform basic tasks is to use AI tooling to create a perfectly
(or almost so) tailored implementation for the target task. While in 2023 I had
to rewrite from scratch (in maybe a month of work with some fixes to
&lt;a href=&quot;https://metacpan.org/dist/Geo-GDAL-FFI&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Geo::GDAL::FFI package&lt;/a&gt;
for a multi-threading issue) an implementation of a
multi-threaded tool for accessing the Copernicus archive and maintaining an
internal geospatial database of products consistently among multiple
generations of the archive, I implemented the STAC protocol variant in a few
hours instead, thanks to Claude Code-based patching and my reviewing and tests
for the resulting codebase. As said in &lt;a href=&quot;/is-ai-driven-coding-the-start-of-the-end-of-mainstream-foss&quot;&gt;this
post&lt;/a&gt;, currently,
the cons of adopting a small third-party FOSS solution outweigh the pros,
particularly regarding the resulting technical debt, compared with a
well-conducted self-consistent AI-based development process.&lt;/p&gt;&lt;p&gt;I’m seeing in my own side projects exactly the mirror of what will probably be
the reality of FOSS projects in the near future, in practice, as I mentioned in
the previous post. Relatively few major/interesting projects will be adopted by
others and attract contributions, while most codebases will become pure one-man
shows, with AI tooling.&lt;/p&gt;&lt;p&gt;A significant part of geospatial processing involves data procurement and
processing, i.e., refining and preparing data and images in order to collect,
filter, and process large volumes of data for subsequent analysis. This is the
most annoying and repetitive part of the process, and also often the most
time-consuming. In my experience, working on those tasks is probably the most
effective way to use LLMs through a spec-and-test-driven design. Whether you
like it or not, it is the most immediate way to produce working code by
iterating with a chain of thoughts and accurate reviewing of results, including
a decent test coverage. As observed in &lt;a href=&quot;https://antirez.com/news/159&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Antirez's
experience&lt;/a&gt;, the AI agent also had the nice
ability to retain my Perl style (which is not secondary, given that Perl has
many programming flavors and variants).&lt;/p&gt;&lt;p&gt;Maybe the final results will be an increase in quick-and-dirty codebases, but
for many scholars, it will be a major simplification of their lives. In most
domains of science, coding activities have been seen as an inevitable evil: they
are a tool, not the primary goal, and even before the advent of LLMs, most
scientific codebases were far from something to be proud of. The Copernicus
attitude to FAFO will also encourage such approaches. Simply because scholars
don't have time to waste chasing changes introduced by this or that data
provider or company when contracts change.&lt;/p&gt;&lt;p&gt;AI sloping attitude? No, simple survival instinct.&lt;/p&gt;</content></entry><entry><title>Does HPC mean High-Pain Computing?</title><id>https://lovergine.com/does-hpc-mean-high-pain-computing.html</id><author><name>Francesco P. Lovergine</name><email>mbox@lovergine.com</email></author><updated>2025-09-06T19:40:00Z</updated><link href="https://lovergine.com/does-hpc-mean-high-pain-computing.html" rel="alternate" /><content type="html">&lt;p&gt;Please, forgive the silly joke in the title of this semi-serious post, but
lately I have been thinking about the strange fate of an area of general
computing that I have spent more and more time in recently, as in the near and
far past. For my job, I have utilized a series of scientific HPC clusters
worldwide to solve multiple computing problems most efficiently by distributing
computation across numerous nodes. Over the last thirty years, all such
platforms have consistently shared the same common characteristics, which
invariably pose a problem in their use for the average scientist
(often a young/junior dedicated to a short-term project) in any
application domain.&lt;/p&gt;&lt;p&gt;&lt;img src=&quot;/images/high-pain-computing.jpg&quot; alt=&quot;HPC means high-pain computing&quot; /&gt;&lt;/p&gt;&lt;p&gt;To use Fred Brooks' definition, HPC technologies have both intrinsic and
incidental fallacies for such users category. The intrinsic one is due to the inner
complexity of creating a parallel and distributed solution to any problem,
possibly in a way that does not harm the final implementation due to the
increase in communication time among computational agents. This is already a
relevant problem &lt;em&gt;per se&lt;/em&gt;, which can often be out of the abilities, knowledge, and
interests of the average researcher in bioinformatics, physics, mathematics,
remote sensing, or whatever other research domain.&lt;/p&gt;&lt;p&gt;The incidental fallacy is instead always due to the accessibility of platforms and the
technologies used for their implementation. At large, all such HPC clusters are
a large pool of multi-core hosts with plenty of memory and connected with
multiple high-speed networks for implementing some sort of multi-tier
distributed POSIX file system and/or object storage.  Users can log in on a
limited number of such hosts that are connected to all others and run some type
of scheduling system (e.g., Slurm or HTcondor) where multiple computational nodes can
be reserved for a limited period of time to execute batch jobs or even an
interactive one (mainly for debugging). In most cases, such clusters can also be
used with some MPI/OpenMP implementations for proper parallel computational
modeling based on message passing among computing agents that run on multiple
cores and hosts, with or without multi-threading. Alternatively, GPUs can also
be reserved and exploited via Cuda/OpenCL. In many cases, such implementations
are vendor-oriented and trigger the need to adopt specific libraries and
compilers that add another layer of complexity to implementations.&lt;/p&gt;&lt;p&gt;The incidental problems start when the casual users discover that all such computing
nodes invariably run some legacy enterprise Linux distribution that is maintained
for a period of ten years or even more, until a full reinstallation of the whole
cluster. On top of such legacy systems (that are for
any practical use simply unusable as such) these scientific clusters give
essentially a few different mechanisms for creating a general computational
environment:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;https://modules.readthedocs.io/en/latest/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Environment Modules&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Containers (&lt;a href=&quot;https://sylabs.io/singularity/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Singularity&lt;/a&gt; or &lt;a href=&quot;https://apptainer.org/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Apptainer&lt;/a&gt;)&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;https://www.anaconda.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Anaconda/Miniconda&lt;/a&gt;-like environment (or free forks like &lt;a href=&quot;https://github.com/conda-forge/miniforge&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Miniforge&lt;/a&gt;)&lt;/li&gt;&lt;li&gt;Some specific software/application to run&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;But for containers, the other solutions are all binary-based hubs, which could
expose them to possible breakages when the application developed needs to access
exotic language bindings for extensions, and the poor users enter the mysterious
and dangerous world of ABI violations and a chain of broken dependencies. Even,
often, such hubs are not always consistent, and any upgrade by the admin team
exposes them to sudden breakages from night to day.&lt;/p&gt;&lt;p&gt;The final solution (or apparently so) nowadays is using containerization and a
target environment where the user code can find all and only the correct
dependencies and versions for the whole software stack of the application. This,
at least, until the third-party hubs of base distributions and languages ensure
complete consistency and retain past binaries and versions for any
medium/long-term need. Of course, a full source-based stack with proper version
tracking &lt;em&gt;a la&lt;/em&gt; &lt;a href=&quot;https://lovergine.com/tags/guix.html&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;Guix&lt;/a&gt; would help to avoid
dependencies on external binary hubs and seems the way to go. Indeed, a small
group of interest in such a solution has existed for a few years, but I am
unaware of so many HPC clusters that consistently propose this kind of
implementation for users. That said, writing Guile Scheme descriptors for
preparing an execution environment may not be within the reach of the average
researcher in biochemistry or astrophysics.&lt;/p&gt;&lt;p&gt;Unfortunately, as I wrote
&lt;a href=&quot;https://lovergine.com/are-distributions-still-relevant.html&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot;&gt;in a past post&lt;/a&gt;
on this digital site, this moves the
whole responsibility of a software stack maintenance onto the shoulders of the
final users, who are often the infamous junior profiles I mentioned before.
These are non-IT specialists who should adopt such HPC platform to implement
solutions as part of their daily job in their special scientific domain.&lt;/p&gt;&lt;p&gt;The result, to be honest, is that the average researcher simply tries to avoid
the whole thing as soon as possible because of the significant complexity that
the entire thing involves, while the private sector introduced specialistic
roles of data and software engineers to manage such problems properly (which is
the only reasonable approach, indeed).  Adding insult to injury, in some
academic areas, such interests in HPC are also viewed with contempt or as a
waste of time, if not openly discouraged.&lt;/p&gt;&lt;p&gt;All this explains why a roundabout in any of the significant HPC clusters
worldwide often guarantees hilarious experiences in terms of who is doing what
and how.&lt;/p&gt;&lt;p&gt;Sometimes, I almost feel like I can hear them swearing...&lt;/p&gt;</content></entry></feed>