The New York Times 'Connections': A Challenging Benchmark for LLM Training

25.2K
1.3K
415
A study by Tuhin Chakrabarty, an assistant professor in the Department of Computer Science at Stony Brook, along with researchers from Columbia University, has uncovered an interesting finding. The New York Times word game 'Connections' emerges as a tough yardstick for training Large Language Models (LLM) in abstract reasoning. This discovery challenges the common perception that AI and machine learning always outshine human capabilities.

AI's Performance in 'Connections'

While AI often dominates in games like chess, the study reveals a different story in 'Connections'. Even the top-performing LLM, Claude 3.5 Sonnect, manages to solve only 18% of the games. The research analyzed AI's responses to over 400 'Connections' games and found that both novice and expert human players outperform AI in solving this puzzle. In the game, players are presented with a 4×4 grid of 16 words and tasked with grouping them into four clusters of four words each based on shared characteristics. For instance, 'Followers,' 'Sheep,' 'Puppets,' and 'Lemmings' form a group as they are 'Conformists.' To group words accurately, one needs to reason with various forms of knowledge, including semantic and encyclopedic knowledge.Tuhin Chakrabarty emphasizes that although the task may seem straightforward to some, many words can be grouped into multiple categories, creating red herrings. This is precisely what makes the game more engaging.

LLM's Strengths and Weaknesses

The research also highlights that LLMs are relatively better at reasoning involving semantic relations like 'happy,' 'glad,' and 'joyful.' However, they struggle with other types of knowledge such as multiword expressions like 'to kick the bucket' (meaning 'to die') and combined knowledge about word form and meaning (adding 'un-' to 'do' creates 'undo' with the opposite meaning).When tested on 438 NYT Connections games with five LLMs - Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, OpenAI's GPT4 Omni, Meta's Llama 3.1 405B, and Mistral Large 2 - the results showed that while all LLMs could partially solve some games, their performance was far from perfect.Read the full story at the AI Innovation Institute website.

New

Orchids on display in a shop
29.4K
2.1K
699
Starting Retinol: 5 Things to Know, According to Dermatologists
Health
Orchids on display in a shop
17.5K
1.6K
220
Christmas Lights May Negatively Impact Astigmatism
Health
Orchids on display in a shop
30.3K
606
103
Is It Bad to Drink Coffee on an Empty Stomach?
Health
Orchids on display in a shop
29.2K
2.6K
446
RSV vaccine while pregnant: How effective is it and how does it work?
Health
Orchids on display in a shop
26.2K
261
39
What Causes Heavy Breathing
Health
Orchids on display in a shop
39.6K
2.4K
356
Poll: Nearly 1 in 10 US Adults Have Lost a Family Member to Drug Overdose
Health
Orchids on display in a shop
44.2K
4K
993
IRS Announces Stimulus Payments for Eligible Taxpayers
Finance
Orchids on display in a shop
24.5K
734
176
Mariah on the Money: Mayfield leads the way for Hannibal Lady Pirates
Finance
Orchids on display in a shop
24.6K
2.5K
860
Finance & Commerce’s Top Projects of 2023
Finance
Orchids on display in a shop
37.6K
375
48
How UEFA's Euro 2025 Prize Money Affects All Stakeholders
Finance
Orchids on display in a shop
25.6K
1.3K
280
CARIBBEAT: Braata Singers adding Caribbean diversity to their music
Entertainment
Orchids on display in a shop
1.3K
52
24
GSTAAD DRAW. Richard Gasquet’s prediction with Halys next. H2H and rankings – Tennis Tonic – News, Predictions, H2H, Live Scores, stats
Entertainment
Orchids on display in a shop
16.7K
1.2K
140
A New Era for Moab Music Festival: Tessa Lark Takes the Helm
Entertainment
Orchids on display in a shop
30.5K
1.2K
256
19th Consecutive Season of NFL Thanksgiving Games
Entertainment
Orchids on display in a shop
15K
1.3K
539
When should kids have access to devices? A Lake Placid mother and daughter reflect
Lifestyle
Orchids on display in a shop
48.4K
1.5K
609
New Era Dawns for Heartland Events Center with Expanded Entertainment and Dining Options
Lifestyle
Orchids on display in a shop
21.2K
1.3K
355
Higher costs of living have negative impact on children in Colorado, annual report finds
Lifestyle
Orchids on display in a shop
38.4K
3.1K
491
Brightening Hospital Stays for Sick Children: The Heartwarming Mission of Rock'n Rooms
Lifestyle
Orchids on display in a shop
4.6K
136
24
Inspiring Future Innovators: “Maker” Books Distributed to A.K. Smiley Library Kids’ Summer Reading Program Participants
Lifestyle
Orchids on display in a shop
18.3K
549
87
Gold medal couple encourages kids at Shriners Children’s Hospital to dream big
Lifestyle