AI in Sentiment Analysis for Asset Returns

Table of Contents

Disclaimer

All articles are for education purposes only, and not to be taken as advice to buy/sell. Please do your own due diligence before committing to any trade or investments.

Disclaimer

All articles are for education purposes only, and not to be taken as advice to buy/sell. Please do your own due diligence before committing to any trade or investments.

Table of Contents

My take: AI sentiment analysis can help with short-term return prediction, but it works best as a filter inside a trading system, not as a trade signal on its own.

If you want the short answer, here it is:

  • AI models beat fixed word-count methods because they read financial text in context.
  • The edge is mostly short-term: intraday to next-day, sometimes a bit longer in less efficient names.
  • Backtest results can look very strong – including 94.5% next-day direction accuracy in one mixed-signal study, 51.02% mean annual excess return in one FinBERT-Gemini test, and 67% annual return with Sharpe 2.0 in one LLM news strategy.
  • But high accuracy does not mean high profit. Costs, delays, slippage, crowding, and drawdowns can cut results fast.
  • The signal tends to matter more in stress periods, major macro events, and in hard-to-value stocks like small caps and younger firms.
  • What I’d trust most: walk-forward testing, strict timing, point-in-time data, and cost-adjusted results.

So if you’re reading this to answer “Does AI sentiment improve asset return prediction?”, my answer is yes, sometimes – mostly over short horizons, and mostly when paired with other signals and strict risk rules.

For me, the main lesson is simple: better text classification is useful only if it survives out-of-sample tests and still makes money after costs.

How recent studies measure sentiment and test prediction

Main data sources used in the literature

The key question is simple: do these sentiment scores help predict next-day or other short-horizon returns within systematic trading frameworks?

Recent papers draw from several text sources, including financial news feeds, SEC regulatory filings such as 10-K and 10-Q reports, earnings call transcripts, corporate press releases, and social media posts from platforms like Twitter/X and StockTwits. Some also bring in market-context variables like ESG scores or Google Trends.

Timing plays a big role. Many studies only count news as a same-day signal if it passes a relevance threshold and appears before the 09:30 EST market open. For slower data, such as ESG scores, researchers use cautious lags like T+3 to make sure the information was already public before any trade could have been made.

Social media needs extra filtering too. Some papers apply Rank-Based Weighting with Time Decay (RBWTD), which gives more weight to posts from higher-impact accounts and to newer tweets.

All of this sets up the next step: testing whether the sentiment signal has any short-term link to returns.

Models used to turn text into return signals

FinBERT is the main model for financial text because it handles finance-specific wording better than general-purpose language models. LLMs such as Google Gemini are often used as a second filter to remove items that sound dramatic but do little for price-direction prediction. In one 2026 study, a FinBERT-Gemini data funnel screened more than 9,000,000 data points from SEC filings and financial news, then kept just 10,400 high-confidence signals – under 0.04% of the starting pool.

Some studies add a Temporal Fusion Transformer (TFT) with a Support Vector Regression (SVR) residual corrector. The goal is to reduce forecast error when the market shifts from one regime to another.

Model Type Primary Role Key Advantage
FinBERT Fast sentiment screening Reads finance-specific nuance
LLM (Gemini / GPT-4) Signal filtering Better context judgement and practical filtering
TFT + SVR Time-series return forecasting Residual correction during regime shifts
VADER Initial rule-based scoring Fast and simple for social media text

At the end of the day, model choice matters less than one thing: does the signal still work when tested on unseen data?

How studies validate performance

A backtest means little if the model has already, in effect, seen the answers. That’s why the stronger papers use out-of-sample testing. A common setup is walk-forward validation, where the model trains on 252 days and then tests on the next 10 days, without access to future data during training.

Execution timing is also kept strict. Signals usually face a t+1 execution lag, so a signal generated today can only be traded at tomorrow’s opening price.

Researchers also rebuild the historical universe on a point-in-time basis to avoid survivorship bias. On the statistics side, many papers now use Newey-West adjusted t-statistics and HAC-robust Diebold-Mariano tests to check whether the reported alpha is more than random noise. In one case, the strategy’s Newey-West t-statistic was 4.01. Transaction costs are often modelled at 4–10 basis points per trade.

Those checks act like a stress test. If the signal fails here, any return claim falls apart fast. If it holds up, then it can be stacked against a plain buy-and-hold benchmark.

Master Systematic Trading with Collin Seow

Learn proven trading strategies, improve your market timing, and achieve financial success with our expert-led courses and resources.

Start Learning Now

What recent research says about AI sentiment and asset returns

Evidence for short-term stock direction and return prediction

Once out-of-sample validation is in place, the next step is simple: which sentiment signals actually help predict returns and stock picks?

Recent research points in the same direction. Sentiment tends to work best when it sits inside a mixed signal, not on its own. One study used a Temporal Fusion Transformer (TFT) model that combined news sentiment, ESG data, and technical indicators. It reported 94.5% directional accuracy on next-day log returns for US tech equities and BTC/ETH.

That said, the edge isn’t evenly spread across the market. Sentiment usually has more pull in hard-to-value stocks, especially small caps and younger firms. Why? Information tends to travel more slowly there, and arbitrage is harder to carry out. In a study covering 3,955 US firms, stocks in the highest sentiment decile had a 32% chance of staying there in the following month. The effect was strongest among hard-to-value stocks, and it reversed within 7–12 months.

LLM-based trading simulations

More recent work has also tested LLM-based pipelines across multi-year backtests. From February 2015 to June 2021, researchers applied the FinDPO framework – built on Meta’s Llama-3 and aligned using Direct Preference Optimisation – to 204,017 financial news articles on the S&P 500. The model turned LLM outputs into continuous sentiment scores. Based on that setup, The strategy, similar to those used by a systematic trader, delivered a 67% annual return with a Sharpe ratio of 2.0 after 5 basis points (bps) in transaction costs. It also improved sentiment classification accuracy by 11% on average against FinLlama.

Another paper looked at a FinBERT-Gemini hybrid focused on the top 50 S&P 500 constituents. Over a 16-year testing period, it reported a 51.02% mean annual excess return, along with positive skewness of 6.11.

These figures come from simulations, so there’s a catch. Live results can weaken once you factor in execution frictions, market impact, and regime changes.

The table below compares the main studies side by side:

Study / Model Data Source Model Type Asset Class Prediction Target Main Finding
TFT + SVR Hybrid News, ESG, Macro, Technicals Temporal Fusion Transformer + SVR US Tech Equities, Global Indices, BTC/ETH Next-day log returns 94.5% directional accuracy; sentiment dominates in turbulent periods.
FinBERT + Gemini SEC filings and financial news Hybrid Discriminative + Generative AI S&P 500 constituents Market-neutral alpha 51.02% annual excess return; positive skewness of 6.11.
FinDPO (Llama-3) Financial news LLM with Direct Preference Optimisation S&P 500 Portfolio returns 67% annual return; Sharpe ratio of 2.0 under 5 bps costs.

When sentiment signals carry more weight: stress periods and major events

Sentiment signals tend to matter more when markets get shaky. In volatile periods, sentiment plays a bigger role, while ESG has more influence in calmer conditions.

One clear example came on 15 June 2022, when the Federal Reserve lifted rates by 75 basis points. An event study found that a sentiment-augmented Fama-French five-factor model explained abnormal returns much better than the baseline model during that period. Researchers are also paying more attention to sentiment volatility – the spread of opinions across sources. During uncertainty shocks, disagreement across headlines can move prices more than headline tone by itself.

That has a plain takeaway for model design: a sentiment model trained on quiet-market data may act very differently during an earnings shock, a macro surprise, or a sudden liquidity event.

What the evidence means for traders and portfolio decisions

Why directional accuracy does not guarantee profits

For traders, the main question isn’t whether sentiment can label text as positive or negative. It’s whether that signal still works after costs, delays, and risk are taken into account. High directional accuracy on its own doesn’t mean a strategy makes money. AI sentiment only helps when it leads to net, risk-adjusted returns, not just better classification scores.

That gap matters more than it seems. Even small trading costs can wipe out the edge in high-frequency strategies. And a system with strong headline returns can still come with painful drawdowns and sharp volatility. Fast text-based signals are hit hardest.

These signals also fade fast. If execution is delayed, the market may have already priced in the move before you get in. On top of that, false positives can drag results down. So can overnight gap risk. And when more firms start using similar tools, crowding can eat into what used to work.

Where sentiment fits in a systematic trading process

In practice, sentiment tends to work best as a filter, not a trigger. Recent studies show that hybrid setups do better when they combine sentiment with technical tools like the 50-day Simple Moving Average (SMA) and the Relative Strength Index (RSI), along with regime filters such as the VIX. Put simply, sentiment can point you to names worth watching. Trend and risk filters should decide whether the trade is worth taking.

That makes sense in live markets. Buying on positive sentiment while the chart is already breaking down is usually a bad idea. Shorting on negative sentiment in a crowded trade can also backfire, especially when borrow costs are high enough to wipe out the edge.

Given that AI sentiment usually has its edge on an intraday to next-day basis, it’s better suited to ranking opportunities, confirming trends, and controlling risk than to generating standalone trades.

Signal Type Role in a Systematic Process Key Risk if Used Alone
AI Sentiment Identifies potential directional bias False positives; regime shifts
Technical Indicators (SMA, RSI) Confirms trend alignment Late in fast markets
Macro Filters (VIX) Helps suspend trading during extreme volatility May miss early reversals
Risk Rules (Stop-Loss) Limits downside from large adverse moves Cannot prevent gaps

The remaining issue is where these models still break down.

Limits, research gaps and conclusion

Main limitations in current studies

The gains are real, but they depend on clean data, strict timing, and market-specific testing.

The biggest issue is data quality. Most financial news and regulatory filings are routine and not useful for trading, so the signal-to-noise ratio is low. In one study, researchers cut 9,000,000 data points down to just 10,400 usable signals.

Look-ahead bias still makes many backtests look better than they are. If time ordering is not handled with care, simulated results can overstate live returns. And even when the setup is done properly, the pain can still be steep: some simulations showed maximum drawdowns above 64%.

Clean timing alone does not fix everything. A signal can still break down if the language model reads tone the wrong way. Language drift chips away at accuracy over time as finance-related wording changes. Even transformer-based models can struggle with sarcasm, irony, and other context-heavy phrasing.

Generalisation is still a weak spot. Results from US datasets do not automatically carry over to Singapore-listed equities. If you want to use the same setup here, you need retraining and out-of-sample checks first.

Key takeaways for future research and practical use

This is why the strongest use case is short-term decision support, not standalone forecasting.

For practitioners, validation matters more than the headline return. Walk-forward testing with windows such as 252 days of training followed by 10 days of testing, along with within-fold scaling and strict “as-of” data lags, is the baseline to ask for. If a result skips those checks, treat it with caution.

Sentiment works best as one input inside a rules-based process, not as a standalone signal.

FAQs

Can AI sentiment work in live trading?

Yes. AI sentiment analysis can work in live trading. Recent research shows that tools like FinBERT and LLMs can generate real-time trading signals and help predict short-term reversals.

That said, whether it pays off in practice depends on a few hard-nosed factors: market conditions, data quality, and processing speed. Then there’s the stuff traders deal with every day – slippage, transaction costs, latency, and order execution.

So the short version is simple: the models can help, but they don’t trade in a vacuum. In live markets, they tend to work best when paired with tight risk management and solid execution controls.

Which assets benefit most from sentiment signals?

Assets that tend to gain the most are usually those with high volatility and a strong link to market psychology. That often includes cryptocurrencies and stocks during choppy periods.

Large-cap equities can also gain from this, especially in sectors shaped by ESG themes and macroeconomic news. As a rule of thumb, assets with high trading volume, fast-moving information, and stronger behavioural effects tend to get the most out of AI-driven sentiment signals.

How can I test sentiment signals properly?

Use a strict validation framework to tell apart genuine effects from spurious correlations.

A few checks matter most:

  • Placebo tests
  • Random common cause tests
  • Subset stability tests
  • Bootstrap confidence intervals

These checks help you see whether the signal holds up, isn’t due to chance, and goes beyond a plain correlation. It also helps to document the results and set minimum pass criteria before deployment.

Share this post:

Facebook
Twitter
WhatsApp
Pinterest
Telegram

Bryan Ang

Bryan Ang is a financial expert with a passion for investing and trading. He is an avid reader and researcher who has built an impressive library of books and articles on the subject.

Leave a Reply

Your email address will not be published. Required fields are marked *

Share this post:

REACH YOUR HIGHEST TRADING PERFORMANCE

Copy My No Brainer Trading Strategy

REACH YOUR HIGHEST TRADING PERFORMANCE

Copy My No Brainer Trading Strategy

Get Started HERE With Our FREE Market-Timing 101 Video Course

X

Copy My No-Brainer Trading Strategy